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In  many  cases,  the  reference  cited  are  not  inclusive, 
but  are  prominent  in  the  field  or  contain  extensive  bib¬ 
liographies.  The  Bibliography  in  this  dissertation  is 
more  detailed  than  the  text  references  and  is  Intended  to 
provide  a  good  foundation  for  those  wishing  to  further 
research  the  field  of  non-parametric  density  estimation. 
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Abs  tract 

A  new  non-parametric  probability  density  estimator  Is 
developed  which  has  the  following  properties: 

1)  It  yields  a  continuous,  non-negative  and  piecewise 
linear  estimate. 

2)  It  converges  to  the  true  density  function  if  the 
true  density  has  no  more  than  a  finite  number  of  discon¬ 
tinuities  of  a  form  where  the  value  of  the  function  at  the 
discontinuity  can  be  considered  the  average  of  the 
limiting  values  on  either  side  of  the  discontinuity. 

3)  It  requires  no  user  supplied  parameters. 

The  estimator  is  shown  to  have  significantly  better 
error  properties,  for  certain  classes  of  distributions, 
than  existing  density  estimators.  The  quality  of  the 
estimate  is  discussed,  tabulated  and  graphically  demon¬ 
strated.  Applications,  including  parameterization,  small 
sample  analysis,  and  two  sample  tests  are  presented. 
These  newly  developed  applications  are  shown  to  improve 
upon  the  generally  accepted  existing  techniques.  Guide¬ 
lines  for  choosing  a  density  estimation  method  along  with 
an  organized  approach  to  method  selection  are  discussed. 


I.  Introduction 


The  historical  development  of  non -pa r a m e t r i c  prob¬ 
ability  density  function  estimators  stems  from  the  histo¬ 
gram  type  estimator  which  was  inspired  by  John  Graunt  and 
further  developed  by  mathematicians  such  as  Petty, 
Huygens,  van  Dael  and  Halley  (230).  Density  estimation 
has  been  attempted  by  distinguished  statisticians  includ¬ 
ing  Pearson,  Gossett,  Fisher,  Johnson,  et.al.(  52 , 14  3,203). 
Their  methods  include  methods  of  parameterization,  kernel 
estimators,  distance  estimation,  entropy  methods,  spline 
techniques  and  series  estimators.  This  dissertation  pre¬ 
sents  a  new  non-parametric  density  estimator. 

A  question  which  is  logically  addressed  is:  "What  good 
is  a  density  estimator?"  Some  uses  of  density  estimators 
were  discussed  by  Sweeder  (202)  and  much  of  the  work  pre¬ 
sented  in  this  dissertation  is  an  extension  of  his  ground¬ 
breaking  efforts.  Some  other  uses  of  density  estimates 
are  discussed  throughout  this  paper.  The  specific  appli¬ 
cations  presented  by  Sweeder  were  avoided  here  since  re¬ 
doing  them  with  a  slightly  different  estimut.  seemed 
rather  trivial.  Some  new  applications  of  density  esti¬ 
mates  are  presented  in  this  dissertation.  In  particular, 
a  two-sample  test  is  discussed  which  takes  advantage  of 
the  potentially  large  difference  created  by  an  unbounded 
operator  acting  upon  relatively  small  differences  in  the 
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CDFs.  The  inteat  of  this  dissertation  is  to  develop  the 
actual  estimator  and  show  the  use  of  an  estimator  of  this 
type.  Many  of  the  "proofs"  rely  on  empirical  evidence 
obtained  from  tremendously  expensive  Monte  Carlo  analysis. 
In  these  cases  only  enough  of  the  Monte  Carlo  runs  were 
completed  to  demonstrate  the  techniques  and  results. 

Throughout  this  dissertation,  comparisons  will  be  made 
among  results  from  samples  from  uniform,  normal,  and 
double  exponential  (Laplace)  distributions.  The  estimator 
developed  is  not  limited  to  these,  or  even  symmetric, 
distributions,  but  for  comparison  purposes  with  previous 
research  (226)  much  of  the  work  presented  here  uses  these 
three  distributions,  which  are  assumed  to  be  representa¬ 
tive  of  platykurtic,  mesokurtic,  and  leptokurtic  distribu¬ 
tions  in  general. 

The  dissertation  is  divided  into  four  main  sections 
(Chapters  II-V).  The  first  discusses  the  development  of 
the  estimator  itself,  the  underlying  theory,  and  the 
trade-offs  made  in  its  development  along  with  the  reasons 
for  those  trade-offs.  The  second  main  section  is  essen¬ 
tially  a  validation  of  the  estimator  developed  in  the 
first  section.  Both  graphical  and  tabular  comparisons  of 
results  are  given.  The  third  section  presents  some  appli¬ 
cations  Including  parameterization  through  distance  esti¬ 
mation,  a  new  small  sample  analysis  technique,  and  a  new 
two-sample  test.  Other  possible  applications  are  dls- 
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cussed.  The  last  main  section  was  inspired  by  a  goal 
which  was  set  during  the  definition  phase  of  this  research 
program.  That  goal  was  to  develop  a  density  estimator 
which  could  be  used  by  the  relatively  uninitiated  without 
the  requirement  to  choose  any  parameters.  This  section 
presents  some  general,  easy  to  understand  and  apply  guide¬ 
lines  for  when  to  use  this,  or  for  that  matter  any,  non- 
parametric  estimator.  Supporting  data  for  a  choice 
between  this  estimator  and  some  others  is  presented. 

The  final  chapter  summarizes  the  results  of  this 
research  effort.  There  is  always  another  step  to  be  taken 
in  research  and  Chapter  VI  discusses  several  possible 
directions  in  which  to  take  that  step.  Hopefully  it  will 
be  of  use  to  those  continuing  down  the  path  to  better  non* 
parametric  density  estimators  and  new  applications  of 


those  estimators. 


II.  The  Estimator. 


Non-parametric  density  function  estiaators  suffer,  to 
one  extent  or  another,  from  some  or  all  of  the  following 
problems : 

1)  They  require  user  specified  "parameters"  which  can 
greatly  affect  the  shape  of  the  estimated  function,  but 
cannot  be,  or  are  not  easily,  optimally  determined.  This 
problem  is  exacerbated  when  the  estimator  is  overly  sensi¬ 
tive  to  these  "parameters".  For  example,  the  maximum 
penalized  likelihood  estimator  (39,178,203)  requires  two 
such  parameters.  Although  it  is  theoretically  possible  to 
find  the  optimal  values,  realistically  the  values  are 
determined  by  trial  and  error.  This  makes  density  estima¬ 
tion  an  art,  with  the  result  that,  when  this  particular 
estimator  is  used  by  the  unskilled,  all  estimates  tend  to 
look  like  normal  density  functions.  Since  this  estimator 
and  a  kernel  estimator  with  similar  problems  are  the  only 
ones  commonly  available  (they  appear  in  the  International 
Mathematical  and  Statistical  Libraries  (IMSL)  package  of 
FORTRAN  subroutines  available  through  IMSL,  7500  Bellaire 
Blvd.,  Houston,  TX,  77036),  many  potential  users  may  have 
rejected  non-parametric  density  estimation  as  too  diffi¬ 
cult  or  not  accurate  enough. 

2)  They  tend  to  be  noisy,  like  the  frequency  polygon 
estimator  (203).  This  can  be  corrected  by  averaging  or 


other  smoothing  processes  such  as  Sweeder  used.  Many  of 
the  Bootstrap  techniques  (47)  are  suitable  for  this  job. 
Frequency  domain  smoothing  via  Fourier  transform  analysis 
may  also  be  used. 

3)  They  are  not  uniquely  defined,  particularly  for 
small  samples.  That  is,  one  may  obtain  an  entirely  dif¬ 
ferent  estimate  by  slightly  varying  a  parameter  of  the 
estimator.  The  histogram  estimator  typifies  this  problem. 

4)  They  only  give  reasonable  estimates  for  relatively 
large  samples.  This  is  a  problem  in  virtually  all  non- 
parametric  estimators  ( Sweeder's  being  a  notable  excep¬ 
tion.)  Unfortunately,  in  many  cases,  large  samples  are 
difficult  or  expensive  to  obtain. 

5)  They  require  restrictive  assumptions  about  the 
form  of  the  underlying  distribution  (i.e.  symmetry,  uni¬ 
modality,  infinite  or  finite  support,  etc.) 

6)  They  do  not  balance  sensitivity  and  robustness. 
That  is,  they  tend  to  either  give  the  same  density  shape 
for  samples  from  a  wide  variety  of  distributions,  or  they 
are  overly  sensitive  to  sample  peculiarities  such  as  out¬ 
liers  or  closely  grouped  data  points.  The  very  nature  of 
a  random  sample  makes  these  deficiencies  difficult  to 
handle.  For  example,  if  one  makes  adjustments  to  the 
estimator  to  take  into  account  close  spacing  of  sample 
points,  then  true  peaks  in  the  density  will  be  rounded  and 
true  valleys  will  be  filled. 


7)  They  result  in  an  infeasible  estimate.  Many 
common  density  estimators  yield  negative  densities,  others 
estimate  support  which  does  not  include  the  entire  sample. 

The  above  problems  cannot  all  be  solved  simulta¬ 
neously.  The  interactions  among  these  areas  is  what  makes 
density  estimation  so  difficult. 

All  non -pa ra me t r i c  density  estimators  have  the  add¬ 
itional  problem  of  estimating  the  support  for  the  density. 
This  is  usually  handled  in  one  of  the  following  manners: 

1)  Estimate  f(x  |  X(i)  <.  *  <.  x(n)) 

2)  Estimate  the  support  based  on  some  sample  extrapo- 
latlon  rule. 

3)  Assume  some  support  based  on  knowledge  of  the  data 
source,  for  example  (0,«),  (0,1),  (-oo,«o),  etc.  This  is 
a  sort  of  Bayesian  non-parametric  estimation. 

4)  Estimate  the  support  from  the  extreme  order  sta¬ 
tistics  from  a  set  of  samples.  That  is,  estimate  the 
distribution  of  ^  and  x(n)  an<*  select  some  percentage 
point  of  these  distributions  as  the  estimate  of  the 
endpoint  (64).  There  is  seldom  enough  data  available  to 
actually  use  this  method. 

Endpoint  estimation  techniques  used  in  this  estimator 
will  be  discussed  later  in  this  chapter.  For  now  we 
assume  that  the  density  is  non-zero  only  on  the  Interval 
Ix(0)»x(n+l)l »  that  values  of  X(0)  and  X(n  +  1)  have  already 
been  defined  or  estimated,  and  that  these  values  converge 


to  the  true  support  of  the  distribution  as  the  sample  size 
increases . 

II. 1  Development  of  the  Estimator 

Consider  a  random  sample,  x  j  ,  *2  , x^  , ... ,  xQ  ,  of  size  n 
from  an  unknown  univariate,  continuous  probability  distri¬ 
bution  function,  F(x).  Let  x(  i )» x( 2  )*•••» x(u )  represent 
the  ordered  random  sample  such  that  x(  l)i.x(2)^.*'*^.x(n)‘ 
Now  define  Gj  *  G(x^jj),  i«l,2,...,n  be  the  plotting  rule 
that  is  associated  with  the  it^>  order  statistic.  G^  is  a 
value  of  the  sample  distribution  function  at  this  point, 
of  the  form  Gj  -  ( i+ or  )  /  ( n+ 0  ) ,  with  - 1  <  a  <  0  <  1  (we 
will  discuss  selection  of  plotting  rule  parameters  a  and  0 
in  more  detail  later  in  this  chapter.)  Let 

Ag  -  Gj  -  Gj.j  -  1/  (n+0) 

We  know  that 

(i) 

f ( x) d x  -  F(x^jj)  -  F (X(i-x)> 

<i-l) 


if  we  approximate 

F(X(i))  -  F(x(1_1))  -  AG 
and  assume  that  f(x)  varies  linearly  between 
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and 


we  obtain: 


x(i> 

ft  -  2AG/(x(1)  '  x^-i))  -  f  i-i 

where 

fi  -  f(x(1)) 

For  a  plotting  rule  using  /3  ■  0  this  is  similar  to  the 
classical  frequency  polygon  estimator. 

This  estimator  has  some  nasty  properties.  The  value 
at  some  points  may  be  negative  since  fj_j  is  not  guaran¬ 
teed  to  be  less  than  ( 2  AG  ) / ( ^ j -x^ j _ j ^  ) .  In  addition, 
since  f(x/Q))  or  f(x^n+jj)  may  be  arbitrarily  defined 
there  are  an  Infinite  number  of  possible  estimators.  Even 
if  we  define  the  density  as  zero  at  the  endpoints  the 
estimation  process  can  be  started  at  either  end  and  the 
result  will,  in  general,  depend  upon  the  end  at  which  we 
start.  This  means  that  the  estimator  is  dependent  upon 
the  path  taken  through  the  sample. 

Both  of  these  undesirable  characteristics  may  be  cor¬ 
rected.  Assume  some  fj  is  the  first  estimate  calculated 
as  a  negative  value.  Let  f*  ■  0, 
and  set 

*i-l  "  l4AG  "  *i-i(x<i-l)  "  x(i-2))  1/(x(i)  ”  x(i-2)) 

A  * 

The  next  calculated  value,  +  i  *  0,  will  always  be 

*  it 

greater  than  zero,  as  will  f,_.  ■  0  (See  Figure  1).  The 
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0.00 


result  of  this  process  is  a  piecewise  linear,  non-negative 
estimate  of  f(x)  given  by: 


f(x)  -  fj.j  +  ^x'x(i-i)^  fi~f  i-l)/^x(i)“x(i-l)^ 


and 


x(i-l)  <  x  <  x(i) 


f  (x) 


x  i  fx(0)’X(n+l)] 


Which,  when  integrated,  yields  a  continuous,  piecewise 

A 

quadratic  distribution  function,  F(x). 

In  order  to  remove  the  ambiguity  in  f(x)  which  exists 
from  the  possibility  of  starting  the  process  at  either 

A 

end,  we  calculate  the  forward  estimate,  f(x),  and  the 

A 

backward  estimate,  £(x),  and  average  the  two  to  obtain 
f(x).  This  process  not  only  removes  the  path  ambiguity 
but  also  tends  to  eliminate  zero  values  of  the  density 
estimate  introduced  in  order  to  assure  non-negativity  of 

A 

f(x).  Figures  2(a)  and  2(b)  show  the  results  of  using 
the  estimator,  as  described  so  far,  on  random  samples  of 
size  100  from  two  distributions.  Notice  that  this  estima¬ 
tor  is  quite  noisy.  We  will  consider  a  solution  to  this 
problem  shortly. 

The  estimator  does  have  some  desirable  properties  when 
we  consider  the  distribution  function  estimate. 

A 

1)  F(x)  is  differentiable  everywhere. 

A 

2)  F(x)  is  a  distribution  function. 

3)  Gj.!  1  F^x(i))  1  Gf  1-1,2,  ...,n 
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Estimate  for  Uni 


These  properties  are  essentially  the  sane  as  those  of 
Sweeder's  estimator.  However  this  estimator  has  the  addi¬ 
tional  property  that  it  does  not  necessarily  go  to  zero  at 
the  sample  points.  It  is  this  feature  that  will  allow  us 
to  show  convergence  to  the  true  density,  and  reduce  the 
amount  of  smoothing  required  to  obtain  a  "good”  density 
estimate . 

Analogous  to  Sweeder  we  may  use  a  Bootstrap  (47)  type 
technique  to  obtain  some  smoothing  of  our  estimator.  This 
is  desirable  for  cases  where  we  have  unnaturally  closely 
grouped  data  within  the  sample  (l.e.  data  which  does  not 
reflect  the  true  character  of  the  underlying  distribu¬ 
tion.)  Experimentation  with  samples  from  known  distribu¬ 
tions  indicates  that  this  is  a  problem  which  occurs 
frequently  in  small  sample  situations.  We  choose  d  sub¬ 
samples  from  our  original  sample  as  follows: 

j-k+md,  k«l,2,...,d;  d<  so  ; 
d£n/2;  ra«0 , 1 , 2 ,...,[ (n-k) /d ] } 

Estimates  are  calculated  using  each  of  these  subsamples 
successively  so  that  we  have  estimates,  f  ^  (  x  )  , 
j *  1  , 2 . d.  The  estimator  for  f(x)  is  obtained  by 


averaging : 


II. 2  Properties  of  the  Estimator 


A  desirable  property  of  an  estimator  is  that  it  con¬ 
verges  to  the  true  function  as  the  sample  size  increases. 
The  estimator  obtained  so  far  has  this  property  as  will  be 
shown  in  the  following  theorems,  but  first  some  fundamen¬ 
tal  definitions. 

Let  R  be  the  real  line,  B  a  Borel  field  on  R  and  P  a 
probability  measure  defined  on  B.  The  function  F  defined 
on  (  R  ,  B  ,  P )  by  F(x)  *  P(  {  I :  I «(  -  oo  ,  x  ]  R}>  is  the  distribu¬ 

tion  function  of  P. 

F  (x),  as  we  have  defined  it,  is  a  distribution 
function,  since: 

_  x  „ 

1)  Fn(x)  *  f  fQ(x)dx  is  non-decreasing 

—  00 

since  fQ(x)  Is  non-negative  by  construction. 

2)  F  (x)  is  continuous  by  construction 

n 

3)  Fq(x)  -  0  x  <  x(Q) 

A 

F  (x)  *  1  x  >  x,  ...  by  definition 

n  vn  +  l; 

Therefore  Fn(x)  is  a  probability  distribution  function. 

The  following  development  also  assumes  a  random 
sample,  Xj,X2»...,xn  from  a  continuous  distribution,  F(x), 
and  that  F'(x)  exists.  Parenthetical  subscripts  are  again 
used  to  represent  the  ordered  sample. 
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Lemma  1  -  A  finite  convex  combination  of  sequences  of 
functions,  each  of  which  converges  (uniformly)  to  a  single 


function,  will  converge  (uniformly)  to  that  function. 


P  roof 


Let 


{f£j(x)} 


1*1,2,.. 
j “1 »  2 , 3 , 


k 


be  a  set  of  sequences 


with 


|  f  ^ . ( x )  —  f ( x  > |  <  e  j  >  N  i»l,2,...,k 


and  let 


S  ( x ) 


^  aiEi)(x) 

i=  1 


i-1 


9 


ftj  >  0 


then 


|S(x)  -  ]Catf(x)|  =  |  ^G^f^x)  "  7%i*(»)l  * 

|  5Z«1(  f  ^  (x)  -  f(x))|  <  |  f  1  j  (x)  -  f(x)|  < 


a  i  € 


e 


The  extension  to  uniform  convergence  is  analogous  if  we 
start  with  the  hypothesis: 


|  f  1  j  ( x )  -  f  ( x )  |  <  e  ;  3>N;  V  x  e  R 


Lemma  2  -  Given  a  partition,  *  {  x  ^£x ££• . .£xq  } 
(a,b)  such  that  g(x;Pn)  - >  g(x),  any  evenly  divided  sub¬ 

partition, 


Pm  “  ^X(j)’  J*k+md»  k-l,2,...,dj  d<  oo  ; 
d£n/2;  m-0,1,2 . l(n-k)/d]} 

results  in  g(x;P  )  — - — — — >  g(x). 

m 

Proof  - 

|  g  ( x ;  P  )  -  g(x;Pq)|  _<  e/2  ;  V  x  e  (a,b)  ;  p,q  >  Nj  (1) 

Since  P  and  P  are  partitions  and  we  have  uniform  conver- 
P  q 

gence,  this  is  the  Cauchy  condition.  Also 

|g(x)  -  g(x;Pp)|  <  e/2  ;  x  e  (a,b)  ;  P  >  N£ 

by  definition. 

Now  let  N  *  d  max(N1>N2)  then  from  (1)  we  have 

|g(x;Pp)  -  g(x;PN) |  <  e/2 

and 

lg(x;PN)  -  g(x)|  <  |g(x;Pp)  -  g(x;PN)|  + 

|g<x)  -  g(x;Pp)  |  <  e 

so 

|g(x;Pm)  -  g ( x ) |  <  e  x  €  (a,b)  m  >.  N 

That  is: 

g(x;Pm)  >  g(x) 
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»■■*»  Islmli 


■JUJU4  n 


The  above  Lemmas  allow  us  to  prove  convergence  of  the 
basic  estimator  and  then  extend  it  easily  to  the  sub¬ 
sampled  case  without  directly  considering  each  estimator 
based  on  a  subsample. 


Theorem  1  -  The  sequences  F^(x)  converge  almost 

everywhere  to  F(x)  where: 


*  *  XJ  (  0) 


i-  1+Qf 


Fj  (  x) 


+  f j(i-l)(x_Xj(i-l))  + 
**j<l)~*J(i-l)^X“Xj(i-l)*  1 
(Xj(i)_Xj(i-l)) 


xJ(i-D  i  x  i  xj(D 


x  *  Xj (m+1 ) 


(m+/3)  (xj  ^  i)_xj  ( i_i)  >  fj(i-l) 


f  j  (x) 


Kj(1-1)  i  X  i  Xj(i) 


otherwise 


*_*.  ,  r  «  .*  V  *.*  _  "  »  *  •  *  •  *  -  . 


(Since  j  represents  the  index  of  a  sequence  based  upon  a 
particular  subsample,  and  we  intend  to  show  the  proof  for 
one  subsample  and  later  extend  it  using  Lemmas  1  and  2,  we 
-will  temporarily  drop  the  j  subscript  for  simplicity.) 


Proof  - 

Consider  the  points  x^^; 

i  *  1,2,...  ,n+ 1 

1 

0 

1 

i-0 

F(x(0) 

j 

)  G 

1  1 

i«l , 2  ,  .  .  .  ,n 

1 

i*n+l 

This  is  essentially  the  empirical  distribution  function, 
E(x),  which  has  been  shown  to  converge  almost  everywhere 
to  F ( x ) . 

That  is 


11m 
n— >  «> 


“»  r<X(l)> 

n—/  co 


lim  G 
n— >  «  1 


F(  x 


(i) 


) 


For  any 


x  in  the  Interval 


lx(  1_D  ,X(  i))  we  know: 


F(X(i-i)>  1  F<x>  i  F^x(i) 


) 


from  the  monotone  property  of  F(x). 
So: 


lim  *(*(!_!))  1  lim  F(x)  <  lim  F(x^) 
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(where  we  denote  lim  as  lim)  or: 

n— >oo 

llm  Gi_i  £  llm  F<x>  £  lim  G! 

F(X(i_i))  <  llm  F(x)  <  F(x(t)) 

and  since 


F<x(i_l)>  £  F<x>  £  F(x(i)) 


|F(x)  -  llm  F(  x)  |  <  F(x^)  -  F(x(1_1)) 


i+a  1-1+a  1 

llm  — —5  -  llm  — x-  *  lim  — —  *  0 

n+p  n  +  /3  n+0 


or 


llm  F  (x)  *  F(x)  almost  everywhere  (a.e.) 

n— >  00 


and  by  Lemma  1  we  have: 


F(x) 


Fj<x) 


- >  F( x ) 


a.e. 


We  have  shown  convergence  of  the  distribution 
function.  We  now  proceed  to  show  the  more  powerful 
result,  convergence  of  the  density  function  estimate  to 
the  true  density  function. 

A 

Theorem  2  -  The  density  function  estimate,  f(x),  con¬ 
verges  almost  everywhere  to  the  true  density  function, 
F'(x)  *  f(x),  provided  f(x)  exists  and  is  continuous. 
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[*Yr 


For  some  point  x  In  the  Interval  l x( *x( 

F(x)  -  F(x-l/n)  G1  -  G1_1 

—  JL  A.  Ill 

1/n 

X(l)-X(l-1) 

11m 

AG 

X(i)'X(i-l) 

f 

2  AG 

f  * 

£i 

X(1)~X<1-1) 

f  i-1 

‘(1-1)  ( 1-2 )  (i-2)  (1-3) 


,X(2)"X(1) 


X(1)'X(0) 


Now  consider  che  limit  of  the  estimate.  Assume: 


x(l)‘x(l-l)  x(l-l)“x(i 


*  0 


Since  the  limits  of  both  terms  exist,  this  Implies 


X(1)~X<1-1> 


+  1  lm 


X( 1-1) “X( 1-2) 


111 


types  of  discontinuities  in  the  deasity  function. 


Theorem  3  -  The  density  function  estimate,  f(x),  con¬ 
verges  almost  everywhere  to  the  true  density,  F'(x)  * 
f(x),  provided  F(x)  is  continuous,  f(x)  exists,  and  we 
define  f(x)  »  ( f ( x _) +f ( x  +  ) )  /  2  . 

Proof  -  As  in  Theorem  2  assume  we  are  interested  in 
the  value  of  f(x)  where  x  €  ^  x  (  i  -  1 )  * x  (  i  )  ^  * 

Case  1  -  f(x)  has  one  discontinuity  at  x^  where 

x( i-l)-X0-X( i) 

f  =  AG _  +  /  AG _ _ AG_ _ 

1  x(i)~x(i-l)  \X( i)“X(i-l)  X( i-1 > “X( 1-2 ) 

♦  ...  ±(--n - — *t— ) 

\  <  2 )  X<1)  X<1)  X(0)/ 

All  the  terms  in  parentheses  above  will  go  to  zero  as  in 
Theorem  2  except  those  containing  ^  x(  ^ ) -x( ^ j ^  »  leaving: 


, .  ;  , ,  2AG 

lim  f .  *  lim  - 

i  X, . x -x, 


-lim 


AG 


‘(i)  (i-1) 


X(i-l)_X(i-2) 


11m 


2  AG 


X( 1) ~X( i-1 ) 


"  F'-(x0) 


But  if  the  derivative  at  the  discontinuity  is  defined  as 
the  average  of  the  left  and  right  derivatives,  then: 


2 


lim  f, 


f:<xo)  +  f;<v 


-  *:<x0>  -  p;(*o) 


and  since  no  x(i)~x(i_i)  terlis  appear  in  the  f  j_  ^ 


equation,  we  have: 


lim  f 


f(x  ) 


That  is,  the  value  of  the  density  estimate  to  the  left  of 
the  discontinuity  converges  to  the  true  density  value  to 
the  left  of  the  discontinuity. 


Case  2  -  Consider  the  case  where  the  discontinuity  occurs 
in  some  other  interval.  If  it  occurs  after  there  is 

no  effect  and  we  have  the  same  result  as  in  Theorem  2  for 

a 

fj.  If  the  discontinuity  occurs  before  x^^,  say  between 
the  previous  two  points,  we  have: 


lim  f  ,  (  x )  -  lim - - 

1  x(i)-x(i-l) 


X(i)_X(i-l)  X(i-l)_X(i-2) 


(i-l)~x(i~2)  x(i-2)”x(i 


nr) 


where  all  other  terms  go  to  zero  as  in  Theorem  2, 


» 


This  reduces  to: 


11m  f.(x)  *  F'(x)  +  F'(xn)  -  2  11m  - - - -  +  F'(x  )  - 

1  +  0  x(  i  -1 )  ~x  (  i  -2  )  "  0 

f:u  >  +  F+(x0) 

F'(x)  +  F;(xq)  -  2  - ^ - —  +  Fl(x0)  *  F'(x) 

This  proof  extends  directly  to  any  finite  number  of 
discontinuities  In  the  probability  density  function  as 
long  as  the  value  at  the  discontinuity  can  be  defined  as 
the  average  of  the  values  on  either  side.  Jump  dis¬ 
continuities  fall  Into  this  category.  The  only  other 
restrictions  on  the  estimator  are: 

1)  The  endpoint  estimator  must  converge  to  the  true 
endpoint . 

2)  There  must  be  a  finite  number  of  subsamples. 

Both  of  these  restrictions  are  easily  met. 

Now  that  we  have  established  the  form  of  the  estima¬ 
tor,  we  must  define  the  following: 

1)  The  number  of  subsamples 

2)  The  choice  of  endpoints 

3)  The  choice  of  plotting  positions 

Since  one  goal  of  this  research  is  to  develop  a  ’’hands- 
off"  estimator,  these  choices  will  either  be  made  a  priori 
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or  the  estimator  will  make  the  choices  based  on  some 


sample  statistic. 

'II. 3  Smoothing 

Aside  from  the  well  known  problems  of  numerical  dif¬ 
ferentiation,  there  are  two  primary  contributors  to  rough¬ 
ness  of  the  density  estimate  as  described  to  this  point. 
The  first  is  the  existence  of  artifically  large  spikes  due 
to  unnaturally  close  spacing  of  several  points  in  the 
random  sample.  The  second  is  the  tendency  of  the  estima¬ 
tor  itself  to  over  (under)  estimate  the  value  of  the 
probability  density  function  at  a  point  when  the  estimate 
at  the  previous  data  point  was  too  low  (high).  The  first 
problem  results  in  a  density  estimate  with  "random"  peaks 
and  valleys,  while  the  second  tends  to  create  oscillations 
in  the  estimate  at  a  frequency  equal  to  the  number  of 
sample  points  divided  by  twice  the  support  interval.  The 
two  problems  have  been  attacked  in  this  dissertation  some¬ 
what  Independently,  despite  the  fact  that  a  solution  to 
one  will  affect  the  other. 

After  investigating  several  techniques  to  smooth  the 
osclllltory  behavior  of  the  estimator,  including  digital 
filtering,  frequency  domain  modifications,  and  inversion 
of  the  distribution  function,  a  straight-forward  averaging 
technique  was  used.  Given  the  data  points  and  an  estimate 
of  the  probability  density  function  at  these  points, 
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{ ( X  t ,  f t)  ;  i«0 ,1 , . . . ,n  +  l } 

we  form  a  new  data  set  and  corresponding  set  of  density 
estimates  as  follows: 

A 

((y1.f(y1));  y0=xQ;  yi*<xi-i+xi>/2;  1-1 *2 » • ♦ • *n+l ; 


n+2  n+1 


;  f (y 1)*( f ( xt  l)  +  f (x1) )/2 ;  1-1 ,2  ,  . . . ,n  +  l ; 


{(y0)~f0;  f(yn+2)“fn+l1 


We  then  perform  a  similar  procedure  to  get  back  to  the 
original  data  points: 

{<Xi.fi>;  f1-(f1+f1)/2;  1-0,1,. ..,n;  f Q-f 0 ;  fn+i“fn+i) 
or  after  simplification: 


fi  *  <fl-l+2fl+fl+l>/4  f*1*2 


By  Lemma  1  this  operation  will  not  affect  the  convergence 
properties  of  the  estimator  since  this  is  merely  a  convex 
combination  of  estimates  which  all  converge  to  the  true 
dens Ity  . 

The  second  type  of  smoothing  Is  designed  to  desensi¬ 
tize  the  estimator  to  anomalous  behavior  in  the  data.  The 
Quenoullle-Tukey  jackknife  (154)  and  other  Bootstrap 
methods  (47,48,59)  are  well  suited  to  this  purpose.  The 
fundamental  technique  in  all  of  these  methods  is  to  gener- 
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ate  estimates  with  portions  of  the  data  and  combine  them 


in  a  manner  which  tends  to  alleviate  the  problems  assoc¬ 
iated  with  the  estimator  operating  upon  the  entire  sample. 
Efron  (47),  Rustagi  (164),  and  Sweeder  (202)  have  all  used 
this  approach  in  density  estimation.  The  problem  with 
these  methods  is  that  if  one  applies  the  method  to  a 
function  estimate  rather  than  a  point  estimate  the  inter¬ 
actions  between  the  “  subestimates”  can  slow  the  conver¬ 
gence  of  the  estimator  substantially. 

Ideally,  every  random  sample  would  be  of  the  form: 

Xj  *  F-1(Gi)  i»l,2,...,n 

where  Gj  is  some  plotting  rule.  Realistically,  we  are 
fortunate  if  the  whole  sample,  let  alone  individual  data 
points,  accurately  portrays  the  characteristics  of  the 
underlying  density.  Subsampling  Is  a  tried  and  proven 
technique  to  reduce  the  overall  noisiness  of  the  density 
estimate.  The  philosophical  idea  behind  subsampling  is  to 
place  unnaturally  closely  spaced  data  points  into  dif¬ 
ferent  subsamples  before  the  estimate  is  actually 
developed.  We  have  already  discussed  the  theory;  the 
question  that  remains  is  how  many  subsamples  to  use. 

A  Monte  Carlo  analysis  was  performed  to  determine  the 
"optimal"  subsample  size.  Twenty-five  runs  were  made  from 
each  feasible  combination  of  eight  subsample  sizes,  (5,10, 
1  5 ,20,2  3 ,25 , 30,4  5) ,  three  d  i  s t r 1 b u t ions ,  (uniform,  normal. 
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and  double  exponential),  and  six  sample  sizes  (5,10,20,40, 
100,200).  The  mean  integrated  square  error  (MISE)  and 
modified  Cramer-von  Mises  (MCVM)  integral  error  (187)  were 
calculated  and  results  are  shown  in  Figure  3.  As  a  result 
of  this  analysis  the  optimal  points  per  subsample  were 
determined  to  be: 

Uniform  type  distributions. . 2 

Normal  type  distributions . 4 

Laplace  type  distributions . 10 

Actually  fractionally  more  points  per  sample  were  used 
based  on  subsequent  studies  which  showed  that,  for  sample 
size  100,  the  "optimal"  number  of  subsamples  for  a  uni¬ 
form  is  46,  for  a  normal  is  23  and  for  a  double  exponen¬ 
tial  is  10.  The  ten  subsamples  for  a  double  exponential 
is  not  really  optimal  in  a  MISE  sense,  but  fewer  sub¬ 
samples  were  found  to  yield  an  unsatisfactorily  noisy 
estimate  while  ten  subsamples  provided  an  acceptable  esti¬ 
mate  with  little  sacrifice  in  calculated  error.  For 
sample  size  100  we  selected  10  subsamples  as  the  minimum 
number  to  avoid  any  potential  noise  problems. 

Since  the  "optimal"  subsample  size  is  not  a  constant, 
we  need  to  be  able  to  discriminate  between  the  classes  of 
distributions  represented  by  the  uniform,  normal  and 
Laplace.  A  modification  to  Hogg's  Q  (79)  statistic  was 
chosen  for  this  purpose. 


Hogg's  Q  is  given  by: 

Q  -  (UQ.  -  La)/(U0  -  L0) 

Where:  Ua  *  average  of  largest  na  order  statistics 

La  z  average  of  smallest  na  order  statistics 
Up,  Ip  are  similar  to  Ua,  La 

The  statistic  defined  above  assumes  symmetric  distribu¬ 
tions,  thus  it  is  not  acceptable  for  a  broad  class  of  non- 
parametric  estimators,  including  this  one.  We  are  par¬ 
ticularly  concerned  with  densities  which  are  assymmetric 
or  multimodal.  For  these  purposes  we  define  three  pseudo¬ 
samples: 

<x(i)5  x(i)*x(i) *  x( i )— xm ’  x( i)“2xm_xn+l-i) *  x(i)>xm> 
<x?i);  x?i)“x(i)*  x( i)— xm 5  xfi)*2xm~xn+l-i) *  x(i)<xmJ 
^x(i)*x(i)*2x25“x(n+l-i)’  x(i)<xm; 

x?i)’2x75“xn  +  l-i)»  x(i)>xm} 

Where : 

xm  -  sample  median 

x2 5  “  <xra+x(0>)/2 
x75  “  (xa+x(n+l))/2 

These  pseudosamples  are: 

1)  the  first  half  of  the  original  sample  reflected 
about  the  median. 

2)  the  second  half  of  the  original  sample  reflected 


about  the  median. 


3)  the  first  half  of  the  sample  reflected  about  an 
estimate  of  the  252  point  and  the  second  half  reflected 
about  an  estimate  of  the  752  point. 

The  Q  statistic  was  calculated  for  the  original  and 
each  of  the  pseudo-samples  (Qq,  Q^»  Q2 »  Q3).  Based  upon 
the  subsample  size  study  and  the  relative  errors  we  estab¬ 
lished  the  following  guidelines: 

1)  When  in  doubt  choose  too  many  points  per  sub¬ 
sample.  An  error  in  this  case  will  result  in  the  density 
maintaining  its  characteristic  shape  but  showing  noise 
characteristics  . 

2)  Be  absolutely  certain  that  the  density  is  of  the 
uniform  type  before  choosing  the  uniform,  since  choosing 
the  small  subsample  size  tends  to  flatten  spiked  den¬ 
sities. 

In  order  to  achieve  these  objectives,  subsample  size 
of  about  2  was  chosen  only  when  Qq,  Q ^  ,  Q2  »  and  Q-j  were 
smaller  than  the  chosen  breakpoint  value  between  uniform 
type  distributions  and  normal  type  distributions.  All 
four  values  of  Q  were  used  in  order  to  assure  that  the 
probability  of  a  spike  in  any  portion  of  the  distribution 
was  remote.  Normal  type  d is t r 1 bu t ion s  were  assumed  when¬ 
ever  Qq,  Q},  and  Q2  were  in  the  range  between  the  uniform- 
normal  and  normal-double-exponential  breakpoints.  In  all 
other  cases,  the  distribution  was  assumed  to  be  "spikey” 
and  the  subsample  size,  n8,  was  chosen  as  follows: 
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min 


( 


®ax( Qq  ,QX  ,Q2  ,Q3)  -  Qn 

r*  r* 

Qd  ~  Qn 


(nd  -  nn)  + 


"»  •  »d) 


where : 


Qd 


“n 

nd 


theoretical  Q  for  normal  distribution 
theoretical  Q  for  double  exponential  distribution 
normal  optimal  subsample  size 
double  exponential  optimal  subsample  size 


After  calculating  the  subsample  size,  the  calculated 
value  was  bounded  on  the  high  side  by  n/2  points  per  sub¬ 
sample,  and  on  the  low  side  by  2  points  per  subsample. 
This  was  based  on  empirical  evidence  that  It  was  never 
advantageous  to  have  less  than  two  subsamples  and  on  the 
Inability  of  this  estimator  to  calculate  a  density 
function  for  a  sample  of  size  one. 

The  values  used  for  the  breakpoints  were  chosen  (based 
upon  a  *  .04  and  /3  -  .5)  as: 


Qun  -  mln(l. 45  +  .0075n,  2.31) 


Qnd  -  mln(l. 9  +  .Oln,  3.12) 


which  are  approximate  linear  fits  to  the  optimal  numbers 
determined  by  Rugg  (163)  limited  by  the  average  population 
values  for  the  distributions.  These  breakpoints  are  not 
critical  due  to  the  method  of  using  p s eudo -s a mp 1 es  and 
based  upon  the  relatively  small  variations  in  subsample 


size.  The  values  of  population  Q's  used  were: 


Qd  a  3-53 

Qj  *  2.70 

Pi  *  1-92 

This  method  resulted  in  the  identifications  shown  in 
Table  1.  These  percentages  are  based  on  a  Monte  Carlo 
analysis  of  1000  cases  with  sample  size  100. 


Table  1  -  Correct  Identification  Percentages  (n*100) 


Actual  Distribution 

Uniform 

Normal 

Laplace 

Uniform 

95 

1 

0 

Identified 

Normal 

5 

73 

0 

as : 

Intermediate 

0 

26 

32 

Lap  lace 

0 

0 

68 

As  the  sample  size  decreases  there  is  a  tendency  for 
the  sample  to  look  more  like  a  sample  from  a  uniform 
distribution.  This  is  reflected  in  Table  2  which  is 
similar  to  Table  1  but  for  sample  size  10.  In  this  case 
there  is  no  Intermediate  subsample  size  due  to  the  small 
number  of  subsamples  in  all  cases. 
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Table  2 


Correct  Identification  Percentages  (n-10) 


Actual  Distribution 

Uniform 

Normal 

Laplace 

Identified 

as  : 

Uniform 

76 

34 

24 

Normal 

21 

51 

CM 

Laplace 

3 

15 

34 

The  amount  of  smoothing  may  be  adjusted  by  the  user  If 
prior  knowledge  of  the  underlying  density  is  available. 
However,  one  may  easily  be  led  Into  the  trap  of  over¬ 
smoothing  In  order  to  obtain  a  "pretty”  density  while 
simultaneously  forfeiting  some  accuracy. 

II. 4  Support  Estimation 

For  practical  purposes  probability  distributions  can 
be  considered  to  have  finite  support,  despite  the  fact 
that  they  are  often  approximated,  for  mathematical  con¬ 
venience,  by  distributions  with  infinite  support.  When 
estimating  a  density  function,  the  estimate  can  be  quite 
sensitive  to  variation  in  the  estimated  endpoints.  This 
is  particularly  true  for  platykurtlc  distributions. 
Consider,  for  example,  the  uniform  distribution  shown  in 
Figure  4. 
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Figure  4  -  Sensitivity  to  Support  Estimation 

The  endpoint  estimate  Is  less  critical  for  leptokurtlc 
distributions  (where  both  tails  are  long)  since  the  bulk 
of  the  density  function  Is  away  from  the  endpoint  and 
unlikely  to  be  greatly  affected  by  small  variations. 

Endpoint  selection  is  avoided  in  most  non-parametric 
density  estimation  techniques  by  estimating 
f  (  x  |  j  )<*<*(  n ) )  •  Alternatives  Include  various  extrapo¬ 
lation  rules  and  methods  of  estimating  percentage  points 
of  order  statistics.  Hall  (64)  estimates  the  distribution 
of  the  first  and  last  order  statistic.  Unfortunately, 
this  approach  is  only  possible  with  large  samples,  since  a 
sample  of  first  order  statistics  must  be  generated  in 
order  to  start  the  estimation  procedure.  Bootstrap 
techniques  (76)  have  been  proposed  for  the  endpoint  esti- 


nation  task,  but  they  frequently  estimate  support  inside 
the  sample  bounds  or  well  outside  the  actual  support  for 
samples  from  distributions  with  finite  support  (28,29). 

The  shape  of  the  density  in  the  vicinity  of  the  first 
or  last  sample  points  is  related  to  the  distance  from  the 
extreme  order  statistic  to  the  endpoint.  While  shape 
estimators  of  various  sorts  exist  (kurtosis,  Hogg's  Q, 
percentile  ratios)  most  are  based  on  the  entire  sample 
thus  somehow  averaging  the  two  tail  shapes.  The  implicit 
assumption  is  that  the  distribution  is  symmetric.  In 
addition,  some  of  these  statistics  are  quite  sensitive  to 
sample  variations. 

A  thorough  Investigation  was  performed  on  a  series  of 
methods  which  adjust  the  linear  extrapolation  of  the 
sample  distribution  function  (based  upon  some  plotting 
rule)  to  account  for  the  estimated  shape  of  the  distri¬ 
bution  tail.  Although  several  of  the  methods  developed 
showed  a  capability  to  predict  an  endpoint  more  accurately 
than  a  linear  extrapolation,  they  were  occasionally  (less 
than  five  percent  of  the  cases  tested)  drastically  in 
error  and  did  not,  in  general,  perform  well  for  small 
samples.  The  reason  for  lack  of  robustness  and  poor  small 
sample  performance  was  the  paucity  of  information  in  the 
few  sample  points  in  the  tails.  The  methods  attempted 
will  be  described  briefly  as  they  may  have  some  applica¬ 
tion  in  cases  where  sample  sizes  greater  than  one  hundred 


are  available.  For  this  estimator  the  new  endpoint 
estimation  techniques  did  not  seem  to  significantly 
improve  the  overall  performance  ,  so  a  modified  linear 
extrapolation  method  was  used  to  fix  the  endpoints.  (Only 
the  left  endpoint,  x(o)»  estimate  will  be  discussed.  The 
right  endpoint,  x(a+i)>  ls  handled  symmetrically.) 

The  methods  investigated  were: 

1) 

x(0)  *  2x(l)  “  x ( 2 ) 

Chooses  as  an  endpoint  a  point  the  same  distance  to  the 
left  of  the  first  order  statistic  as  the  second  order 
statistic  is  to  the  right.  This  method  has  the  advantage 
of  simplicity  but  is  extremely  sensitive  to  sample  vari¬ 
ation  £ .  In  addition,  it  tends  to  give  poor  results  for 
distributions  with  light,  long  tails  and  for  those  with 
tails  heavier  than  the  uniform,  for  example  a  U-shaped 
Beta  . 

2) 

x(0)  *  x(m)  “  ^  x(  m)  _x(  1 )  )®m^  ^m”'7  1  ^ 

Choose  as  the  endpoint  a  linear  extrapolation  of  the 
points  (x(m).Gm)  and  1  <  m  <  n/2.  This  method 

reduces  the  sensitivity  of  the  estimate  to  sample 
variations  but  suffers  from  problems  similar  to  those  of 


method  1. 


>> 


3) 


I 


m  n/  2 

*(0)  ’  kl  E  (X"X(i))  +  k2  S  (*"x(i)) 

i  “  1  i  =  1 

X  -  sample  median 

Chooses  as  the  endpoint  a  point  based  on  two  averages 
relative  to  the  sample  median.  This  method  modifies 
method  1  to  make  it  more  robust  but  still  suffers  from  the 
problem  of  a  linear  estimator  trying  to  fit  a  non-linear 
function.  Method  3  also  requires  a  relatively  large  sample 
to  give  reasonable  results. 


4) 

x(0)  *  (l+kR)x(1^  -  kRX  0<k<l 

m  n/2 

R  *  (n/2m)  (X-x^))/  (X-x^j) 

1=1  M 


Chooses  as  the  estimate  a  linear  extrapolation  weighted  by 
the  function  R  which  is  a  measure  of  the  shape  of  the 
distribution  similar  to  Hogg's  Q  statistic.  This  method 
is  more  versatile  since  it  adjusts  the  slope  of  the  extra¬ 
polation  method  based  upon  the  sample.  Unfortunately,  the 
R  statistic  was  found  to  be  sensitive  to  sample  variations 
and  is  not  a  single  valued  function  of  actual  endpoint 
location . 


5)  x(o)  ”  quadratic  least  square  fit  to  the  points 
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(x(l)*Gi),  (x(2)*®2^*  ^x(3)*®3^ 

using  the  equation: 

G(x)  -  a(x  -  X(0))2  +  b ( x  -  X(qj) 

This  method  Is  a  quadratic  fit  to  the  data  subject  to  the 
constraint  that  the  resulting  equation  reaches  a  minimum 
at  the  point  (x(g)»0).  The  method  Is  quite  good  for 
distributions  with  long  tails  but  was  poor  for  uniforms, 
exponentials,  U-shaped  betas,  etc. 

6) 

x(0)  *  (x(0)l  '  K(x(m)-x(l)>>P' 

m-k 

?'  -  exp {  ^  I<*(l+k)-x(l+l)>/(x(l+k-l)-x(l)>l>  k<m 
1-1 

This  method  calculates  a  more  robust  percentile  ratio,  , 
and  adjusts  a  linear  extrapolation  based  on  the  value  of 
?' .  While  this  method  appears  to  have  merit,  in  practice 
the  value  of  P'  was  found  to  be  non-unique  for  widely 
varying  distributional  shapes,  and  quite  sensitive  to 
sample  variations  due  to  the  division  and  exponentiation 
opera  tors  . 

7)  Let 

h[(x(0)2“x(0)>/<x(m)-x(l)>l  “  p' 

then 

x(0)  *  x( 0) 2  "  <x(m)-x(l)>h_1<p'> 
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Where  *  x(0)  as  determined  by  method  2.  This  method 

determines  the  function  h  empirically  for  a  series  of  beta 
distributions  with  various  parameters  and  uses  the  inverse 
function  to  estimate  the  endpoint.  The  non-uniqueness  of 
P'  and  its  sensitivity  to  sample  variations  led  to  poor 
estimates  of  X(q)’ 

8)  This  method  was  the  same  as  method  7  except  P'  was 
replaced  by: 

m  m 

S  -  (■/  In ( n ) )  ^2  <x(m+l)-x(i)>4/l  ^  < x ( m+ 1 ) ~x ( i ) >1  2 J 2 
i*l  i-1 

The  method  is  inspired  by  the  sample  kurtosis  with  an 
empirically  defined  scaling  factor,  m/  ln(n),  included  to 
reduce  sensitivity  to  sample  size  and  the  fractional  por¬ 
tion  of  the  sample,  m/n,  used  in  the  calculation  of  S. 

Method  1  was  used  in  this  estimator  (modified  as 
described  below)  for  the  following  reasons: 

1)  The  ability  to  generate  a  density  estimate  for 
small  samples  was  desired.  All  other  endpoint  estimation 
schemes  require  larger  samples  than  method  1  to  give 
reasonable  results. 

2)  The  method  is  simple  with  no  subtle  pitfalls 
and  gives  reasonable  results  which  do  not  contradict  known 


facts  . 


Philosophically,  one  feels  that  the  entire  sample, 
rather  than  a  subsample,  must  be  "better"  for  approxi¬ 
mating  endpoints.  The  problem  with  picking  endpoints  and 
then  using  the  same  selected  values  in  the  calculations  of 
each  of  the  subsample  densities  is  that  too  much  prob¬ 
ability  tends  to  be  lumped  in  the  tails.  On  the  other 
hand,  allowing  each  subsample  to  determine  its  own  end¬ 
points  tends  to  spread  the  estimate  over  a  wider  support 
which  adversely  affects  estimator  performance  for 
densities  which  do  not  tend  to  zero  as  the  endpoint  is 
approached.  A  compromise  solution  was  developed  experi¬ 
mentally  which  seems  to  eliminate  both  these  problems. 

Define : 

X( o)  »  x(n+i)  *  estimates  of  endpoints  based  upon 

the  entire  sample 

*(0)i  »  x(n+l)i  “  estimates  of  endpoints  based  upon 

the  it^1  subsample 

x(0)i  »  x(q+i)i  “  endpoint  estimate  used  in  calcu¬ 
lating  density  from  the  it^1  subsampl 

Then 

x(0)i  “  max(x(0)i»x(0)) 

x(n+l)i  “  miQ(x(n+l)i»x(n+l) > 

As  can  be  seen  in  Chapter  IV. 2,  this  endpoint  estima¬ 
tion  technique,  when  applied  to  small  samples,  results  in 


an  excellent  approximation  to  the  1/n  and  (n-l)/n 
percentage  points  of  the  true  distribution  by  the  esti¬ 
mated  distribution.  Thus  the  errors  In  the  endpoint  esti¬ 
mates  due  to  this  method  are  insignificant  for  most  appli¬ 
cations  other  than  approximating  points  far  out  In  the 
tails  of  the  distribution. 


II. 5  Plotting  Position  Selection 


Plotting  positions  are  defined  as  a  set  of  cumulative 
probabilities  associated  with  a  set  of  ordered  observa¬ 
tions.  Their  purpose  stems  from  the  use  of  probability 
paper  (as  far  back  as  1396)  to  try  to  predict  distribu¬ 
tions  of  observed  random  variables.  They  were  commonly 
used  by  hydrologists  to  analyze  flood  data  (74).  Generally 
an  attempt  Is  made  to  approximate  some  point  in  a  distri¬ 
bution  by  choice  of  plotting  position,  for  Instance 
E[ F( x± )  ]  . 

As  Harter  (69)  points  out  In  his  excellent  summary  of 
the  history  and  use  of  plotting  positions,  much  of  the 
problem  regarding  the  choice  of  plotting  positions  Is  due 
to  the  fact  that 


F [ E( x i ) ]  *  E( F( x  ^ ) ]  -  1/ (n  +  1) 


except  for  a  uniform  distribution.  The  median  ranks 
choice  of  plotting  position  is  attractive  for  the  case  of 
a  single  point,  since  for  monotonic  functions  the  median 
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of  the  function  Is  the  function  of  the  median.  Unfortu¬ 
nately  this  is  not  true  for  functions  of  more  than  one 
random  variable. 

Table  3  shows  some  of  the  historically  more  popular 
choices  of  plotting  positions.  A  small  amount  of  empir¬ 
ical  investigation  of  many  of  these  plotting  positions  was 
done  but  there  was  no  obviously  better  choice  for  the 
determination  of  the  density  estimate.  Harter  (69)  gives 


Table  3  -  Plotting  positions  of  the  i11*1  Order  Statistic 


F(  x  ) 

Description 

1  . 

i/n 

value  of  the  empirical 

distribution  function  (EDF) 

2  . 

i/(n+l) 

mean  rank 

3  . 

(1-1)/ (n+1) 

mode  rank 

4  . 

( i- • 3 ) / (n+ .  4) 

median  rank  (approximation) 

5  . 

(i-.5)/n 

midpoint  of  the  jump  of  the  EDF 

6. 

(n(2i-l)-l)/(n-l) 

average  of  mean  and  mode  ranks 

7  . 

(1  —  .375)/ (n+ .2  5 ) 

efficient  approximation  for  the 
normal  distribution 

8. 

(i-a)/(n-a-b+l) 

a,b<l 

Blom's  plotting  position  (15) 

9  . 

( i+a ) / ( n+b ) 

- 1 <a <b a 

a  more  general  plotting  position 

a  detailed  analysis  of  choices  of  plotting  rules  and  that 
will  not  be  repeated  here.  For  the  purposes  of  this  work, 
the  approximate  median  ranks: 

Gt  -  (i-.3)/(n+.4) 

plotting  positions  were  used.  This  is  the  same  approach 
taken  by  Sweeder. 

While  the  plotting  rule  does  not  in  itself  greatly 
affect  the  estimate,  in  conjunction  with  subsampling  it 
does.  The  reason  for  this  is  that  equal  areas  are  forced 
into  unequal  Intervals  at  the  ends  of  the  subsample. 
Figures  6  and  7  illustrate  this  problem.  In  Figure  5  we 
have  a  sample  shown  along  with  the  density  generated  with 
no  subsampling.  If  we  subsample  twice  we  obtain  Figures  6 
and  7  which  are  averaged  to  get  the  smoothed  estimate, 
Figure  8.  Note  that  the  subsamples  and  the  resulting 
estimate  tend  to  have  peaks  near  the  endpoints.  This  is 
due  to  forcing  the  estimate  to  generate  too  much  area  at 
the  ends.  For  example,  in  subsample  one,  approximately 
one-fifth  the  area  is  generated  between  X(q)  ®nd  x(l)» 
while  this  Interval  is  only  one  of  the  nine  defined  by  the 
endpoints  and  sample  points.  Sweeder's  estimates  fre¬ 
quently  showed  a  characteristic  hump  near  the  endpoints, 
which  was  due  to  this  phenomenon. 

A  solution  to  this  problem  is  to  reapportion  the  area 
generated  between  the  points  of  the  subsample.  A  new  set 
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lgure  6  -  Density  Estimate  Generated  from  Subsample  One 


SAMPLE 


Figure  7  -  Density  Estimate  Generated  from  Subsample  Two 


SAMPLE 


of  plotting  points  are  chosen  such  that  the  values  of  G^s 
and  Gas,  the  first  and  last  subsample  plotting  positions, 
are  equal  to  the  plotting  positions  for  the  corresponding 
-point  in  the  entire  sample.  The  plotting  positions  for 
the  rest  of  the  subsample  points  are  determined  by  simply 
d  ivid ing  Gls-Gns  by  the  number  of  intervals  remaining. 
This  has  the  result  of  making  the  estimated  subsample 
probability  density  function  values  more  closely  represent 
the  entire  sample  density  function  in  the  tails,  while 
taking  advantage  of  the  smoothing  properties  of  sub¬ 
sampling  throughout  the  rest  of  the  support.  Since  this 
approach  is  equivalent  to  selecting  a  different  set  of 
(assymetric)  plotting  positions,  the  convergence  proper¬ 
ties  of  the  estimator  remain  unchanged. 

11.6  Example  Problem 

The  following  example  illustrates  the  use  of  the  new 
density  estimator.  Data  used  represents  the  lifetimes  of 
eight  grinding  wheels  and  are  extracted  from  Table  11.10 
of  Kapur  and  Lamberson  (88). 

X  -  (22,25,30,33,35,52,63,104) 

First  we  will  calculate  an  estimate  with  no  subsampling  or 
smoothing  to  illustrate  the  technique.  Following  this  we 
will  calculate  the  smoothed  estimate  as  described  in  the 
earlier  portions  of  this  Chapter. 


The  plotting  positions  are: 


G  -  -llll-  -  (.0833  , .2  024  , .32  14  , .4405 
n+  .4 

.  5595  , .  67  86  , .7976  , .9  167  ) 

Endpoints  are  calculated  as: 

X  q  =  19  Xg  *  14  5 

The  forward  pass: 

f0  *  0 

f^Xj/2  *  .0833  ->  fx  -  .0238 

f2  -  .0238 
f3  -  .0238 
f4  -  .0555 
f5  -  .0635 
f6  <  0 

So  we  set  *6  "  ® 

And  recalculate  ■  (.47  6  -  f ^( x^~x 3) )/ ( 

Continuing  ^7  *  .0216 

f8  <  0 

So  we  set  ^8  “  ® 

And  recalculate  t-j  “  .0092 

f9  -  .004  1 

We  now  calculate  the  backward  pass: 

f9  0 

fa  A  Tf.nl  2  -  .0833  ->  fp  -  .004  1 


Recalculate 
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f6  “  • 
f4  -  .119 

f3  <  0  *>  f3  -  0 

Recalculate  f4  *  .0952 

f2  -  .047  6 
fx  -  .0317 
f0  -  .0238 

Averaging  the  forward  and  backward  passes  yields: 

f  -  (  . 01  19,. 0436,  .0357,  .01  19,.  0  754,  .0082, 

.0082, .0055, .0021 , .0021) 

The  result  of  this  estimation  is  shown  in  Figure  9.  Note 
the  amount  of  noise  In  even  this  simple  estimate. 

Ue  now  calculate  the  "optimal”  number  of  subsamples  as 
two  and  obtain  the  smoothed  estimate. 

Yj  -  (22,30,35,63) 

Y  2  -  (25,33,52,104) 

y  *  max(19,14)  ■  19  y^  •  min(91,145)  *  91 

y 2q  *  max(19,17)  ■  19  y2^  -  min(145,156)  *  145 

The  first  subsample  is  augmented  with  the  endpoints 

(  19  ,22,30,35,63,91) 


ION 


Plotting  positions  are  calculated 


Gl  «  .7/  8.4  -  .0833  G4  *  3. 7/4. 4  -  .8409 
G2  =  (G4+2G^)/  3  =  .335  8  G3  -  (2G4  +  GL)/3  =■  . 

Forward  pass  is  calculated  as  before: 

f0  -  0 
fj  *  .055  5 
f 2  “  .007  6 
f3  *  .09  34 
f4  <  0  ->  f4  -  0 

Recalculate  f3  -  .0292 

f5  -  .0114 

Backward  pass  is  now  calculated: 

f5  *  0 

f4  -  .0114 
f3  -  .0066 
f2  -  .0944 
fx  <  0  ->  fx  -  0 

Recalculate  f2  *  .0763 

f0  -  .05  5  5 

The  first  subsample  estimate  is  given  by  the  pairs 

(  19  , .027  8)  (  22  , .027  8)  (  30  , .0420) 

(  35  , .0179)  (  63  , .0057)  (9  1  , .0057) 


Performing  similar  calculations  for  the  second  subsample 
we  obtain  the  pairs: 


(  19  ,.0055) 

(25  ,.04  7  6) 

(  33  , .0156) 

(52  , .0086) 

(  104  ,.002  1) 

(  145  ,.002  1) 

We 

now  smooth  the  estimates  using 

A 

*i  * 

<*1-1  ♦  2<i  + 

fi+l)/4 

to 

obtain,  for  the  first  subsample: 

(  19  ,.027  8) 

(22  ,.0314) 

(  30, .0324) 

(35, .0209) 

(63  , .0088) 

(9  1  , .0057) 

and 

for  the  second  subsample: 

(  19  ,.0055) 

(25  ,.029  1) 

(33, .02  19) 

(  52  , .0087) 

(  104  ,.0037) 

(  145  , .0021) 

Note  that  at  this  point  the  density  estimates  no  longer 
integrate  to  one  since  the  smoothing  operation  is  not 
weighted  by  the  sample  Intervals.  This  will  be  corrected 
after  averaging,  but  first  we  must  interpolate  within  each 
sample  to  find  the  values  at  corresponding  x  coordinates. 

The  first  subsample  provides: 


(  19  ,.027  8) 

(22  ,.03  14) 

(  25  ,.03  18) 

(30, .0324) 

(  33  , .0255) 

(35, .0209) 

(  52  ,.01  36) 

(63, .0088) 

(  9  1  ,.0057) 

( 104,0.0) 

(145,0.0) 
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Note  that  there  is  additional  area  added  implicitly 
between  the  points  with  x  coordinates  of  91  and  104.  This 
provides  additional  smoothing  for  the  transition  between 
subsample  estimates. 

The  second  subsample  provides: 


( 19 ,.0055) 

(  22  , .0  17  3) 

(  2  5  ,.029  1) 

( 30, .0240) 

(  33  ,-02  19) 

(  35  ,-0205) 

(  52  ,.0087) 

(  63  , .007  6) 

(9  1  ,.0049) 

(  104  ,-0037) 

(  14  5  , .002  1) 

Averaging  the 

two  subsample  estimates  we  obtain: 

(19,-0167) 

(22  , .0244) 

(25  ,.0305) 

(30,-0282) 

(  33  , .0237) 

(  35  ,.0207) 

(52  ,.0112) 

(63, .0082) 

(91,-0053) 

(104,-0019) 

{  145,-0011) 

Integrating 

,  we  see  that 

the  total 

area  under 

curve  is  1.08815 

so  we  divide  each  of  the 

pdf  estimates 

this  value  to 

obtain  our  final 

estimate: 

(  19,-0153) 

(22  ,.0224) 

(  25  ,.0280) 

(30, .0259) 

(  33  ,.02  18) 

(  35  ,.019  0) 

(  52  ,.0103) 

(63  ,.0075) 

(91  ,.0049) 

(  104  ,.0017) 

(  145  ,.0010) 

Figure  10  shows  this  estimate.  Also  shown  is  the  Ueibull 
fit,  determined  using  Ueibull  probability  paper,  given  by 
Kapur  and  Lamberson.  Since  the  two  density  estimates  are 
claarly  different,  the  natural  question  is  which  is 
better.  There  is  no  answer  to  this  question,  however  we 


can  calculate  the  likelihood  of  the  sample  for  each  of  the 
distributions  (assuming  independent  sample  points.)  When 
we  do  this  the  value  for  the  Weibull  estimate,  Ly,  and  the 
value  for  the  new  estimate,  L,  are: 

Lw  -  9.5  x  1 0 " 2  1 

L  *  8.4  x  10“16 

Clearly  this  particular  sample  is  much  more  likely  to  be 
from  the  new  density  estimate.  It  is  also  interesting  to 
calculate  the  likelihood  of  this  sample  coming  from  a 
uniform  distribution,  since,  for  very  small  samples,  a 
uniform  distribution  tends  to  maximize  the  likelihood. 
The  likelihood  for  a  uniform  distribution  is: 

L0  <  4.9  x  10~16 

One  other  question  remains  along  these  lines  and  that  is 
the  question  of  what  the  subsampling  and  smoothing  have 
done  to  the  likelihood  of  the  sample.  If  we  calculate  the 
likelihood  of  this  sample  for  the  unsmoothed  estimate  with 
no  subsampling  we  obtain: 

L  =  1.1  x  10"15 

The  significance  of  these  numbers  is  subjective.  They  are 
presented  more  as  a  matter  of  interest  than  as  a  claim  for 
the  quality  of  the  new  estimator. 


•  ■*  -*  -  • 
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III.  Quality  of  the  Estiaator. 

Now  that  we  have  developed  the  new  no n -p a r a m e t r 1 c 
estimator,  an  evaluation  of  the  quality  of  this  estimator 
Is  required.  The  approach  taken  In  this  chapter  Is  to 
obtain  Monte  Carlo  estimates  of  the  accuracy  of  the  new 
estimator  and  compare  these  results  with  existing  density 
estimators.  Wegman  (226)  provides  a  discussion  of  and 
results  from  several  other  n o n - p a r a m e t r  1  c  density 
estimates.  Tapia  and  Thompson  (203)  discuss  properties  of 
some  other  estimators  and  Sweeder  (202)  also  provides  some 
comparisons.  In  order  to  compare  with  these  other 
results,  twenty  five  Monte  Carlo  repetitions  using  samples 
of  size  one  hundred  were  used  In  this  study.  The  results 
presented  here  are  consistent  with  Monte  Carlo  studies 
using  one  hundred  repetitions. 

The  measure  of  merit  most  frequently  used  In  these 
comparisons  Is  Mean  Integrated  Square  Error  (MISE)  or  fre¬ 
quently  MISE  Is  approximated  by  the  average  squared  error 
(ASE).  The  following  measures  of  merit  were  considered  as 
alternatives  In  this  study: 

1)  Anderson-Darling  Integral 

2)  Modified  And e r s on -Da r 1 lng  Integral 

3)  Average  Square  Error 

4)  Cramer-von  Mlses  Integral 

5)  Modified  Cramer-von  Mlses  Integral 


6)  Kolmogorov-Smirnov  Integral 

7)  Modified  Kolmogorov-Smirnov  Integral 

8)  Integrated  Square  Error 

9)  Integrated  Absolute  Error 

10)  Maximum  Absolute  Deviation 

Although  all  of  the  above  measures  gave  different  num¬ 
bers,  relative  comparisons  indicated  that  only  Maximum 
Absolute  Deviation  varied  significantly  from  the  others. 
This  was  true  primarily  at  the  points  of  discontinuity  in 
the  density  function.  Thus,  for  the  purposes  of  this 
study.  Average  Square  Error  (approximately  equal  to  MISE) 
is  used  when  comparisons  are  made  with  other  density 
estimators,  and  Integrated  Absolute  Error  Is  used  for  two 
sample  tests  since  the  numbers  retain  more  of  an  intuitive 
feel,  at  least  to  the  author.  Based  upon  results  of 
investigations  of  those  ten  measures  of  merit,  the 
research  results  would  not  change  substantially  if  any  of 
the  first  nine  measures  were  used  and  possibly  not  even 
for  the  last.  For  derivations  and  definitions  of  these 
error  measures  see  Sweeder  or  Sahler  (165). 

Both  the  estimated  probability  density  function  and 
cumulative  distribution  function  are  compared  with  other 
estimation  techniques.  Tables  4  and  5  show  the  results 
of  these  comparisons.  The  results  of  the  density  function 
comparisons  indicate  that  the  new  estimator  is  clearly 
superior  for  platykurtic  distributions,  equal  to  the  best 


Table  4  -  Comparison  of  Density  Estimators,  Average  Square  Error  (n«100) 


Table  5  -  Comparison  of  Distribution  Estimators,  Average  Square  Error  (n*100) 


previous  estimates  for  mesokurtic  distributions,  and 
slightly  Inferior  to  Sweeders  for  leptokurtic 
distributions.  A  small  number  of  cases  have  been  run  in 
each  of  the  distribution  classes  with  Cauchy,  exponential, 
and  various  beta  distributions.  These  additional  runs 
show  that  uniform,  normal,  and  double  exponential  distri¬ 
butions  are  indeed  representative  of  these  other  distribu¬ 
tions  . 

The  plots  shown  in  Figures  11  to  16  demonstrate  graph¬ 
ically  the  ability  of  the  estimator  to  fit  various 
densities.  The  normal,  uniform,  and  double  exponential 
plots  show  the  maximum  likelihood  estimate  (parametric) 
for  the  random  sample.  Actual  data  show  that  the  non- 
parametric  estimate  has  smaller  MISE  than  the  parametric 
estimate  for  the  normal  31Z  of  the  time  for  sample  size 
100.  These  plots  are  the  best  results  obtained  from  one 
hundred  random  samples  from  each  distribution.  The 
following  plots,  Figures  17  to  21,  are  the  best  results 
obtained  from  ten  samples,  each  of  size  one  hundred,  from 
the  distributions  shown. 

The  results  for  the  distribution  function  are  quite 
similar,  showing  consistent  improvement  over  the  empirical 
distribution  function,  and  errors  comparable  to  the  ex¬ 
tremely  good  values  achieved  by  Sweeder.  Note  that  the 
estimator  was  "optimized”  to  provide  a  good  density 
estimate.  Improvement  in  the  estimate  of  the  distribution 
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es 


Estimated  Cauchy  Densities 


function  is  possible  if  one  does  not  worry  about  the  den¬ 
sity  function.  During  the  development  of  this  estimator, 
it  appeared  that  fewer  smoothing  operations  would  improve 
the  quality  of  the  distribution  function  estimate  slightly 
Figures  22  to  24  show  the  actual,  estimated  and 
empirical  distribution  functions  corresponding  to  the 
estimates  shown  in  Figures  14,  15  and  16  respectively. 

The  improvement  over  the  empirical  distribution  function 
is  clearly  evident  in  these  plots.  Also  notice  the 
smoothness  of  the  estimated  distribution  function. 
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Figure  22  -  Distribution  Function  Estimates  for  Uniform  (n»10) 
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IV.  Applications 


This  section  deals  with  ways  of  applying  the  new 
estimator  derived  in  the  previous  chapter.  The  first 
application,  distance  estimation,  is  included  here  because 
one  frequently  wishes  a  simpler  formulation  for  the  den¬ 
sity  function  in  order  to  perform  some  other  application. 
The  small  sample  analysis  section  demonstrates  the  use  of 
the  new  distribution  function  to  perform  a  task  which  is 
poorly  accomplished  by  the  more  traditional  distribution 
function.  The  two  sample  test  uses  the  estimated  density 
directly  to  improve  the  power  of  an  existing  test.  Many 
more  applications  are  possible,  and  a  few  of  these  are 
discussed  in  the  section  on  other  applications. 

IV. 1  Distance  Estimation 

One  of  the  problems  with  non-parametric  estimators  is 
the  difficulty  in  using  the  resultant  estimate.  This  Is 
due  to  the  form  of  the  estimator  and  the  number  of 
'‘parameters'’  required  to  specify  which  of  the  members  of 
that  form  is  the  estimate.  For  instance,  in  order  to 
completely  specify  the  proposed  density  estimate  at  least 
2n+4  "parameters”  and  possibly  as  many  as  3n  (depending  on 
subsample  endpoints)  are  required.  It  would  be  useful  to 
reduce  the  number  of  parameters  by  converting  to  another 
functional  fora.  A  logical  approach  to  this  is  to  try  to 
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convert  the  non-pa ra me t r 1 c  estimate  to  one  member  of  a 
family  of  parametric  distributions.  This  has  been  accom¬ 
plished  through  minimum  distance  estimation. 

In  general,  the  minimum  distance  estimate  Is  a  set  of 
parameter  values  minimizing  some  distance  function  defined 
in  terms  of  the  hypothesized  probability  structure  and  the 
sample  generated  estimate.  Originally  developed  by 
Wolfowitz  (234),  minimum  distance  estimation  attempts  to 
match  the  entire  probability  structure  (rather  than  the 
moment  structure)  of  the  data  with  a  specified  model. 
Minimum  distance  estimators  have  been  proposed  by  Beran 
(12),  Parr  (135),  Daniels  (35)  and  others  (See  136)  and 
have  been  shown  to  possess  some  significant  robust  charac¬ 
teristics.  However,  most  of  the  Monte  Carlo  investiga¬ 
tions  of  these  estimators  have  been  confined  to  distribu¬ 
tions  close  to  the  normal. 

A  series  of  logical  candidates  for  the  distance 
estimation  task  was  considered.  These  include: 

1)  General  Exponential  Power  Distribution.  This  distri¬ 
bution  models  symmetric  data  which  can  be  extremely  lepto- 
kurtic  or  platykurtic.  Extremes  of  the  distribution  occur 
at  the  uniform  model,  where  p  *  infinity,  and  at  the 
double  exponential,  where  p  ■  1.  Moderate  tail  lengths  are 
a  function  of  the  shape  parameter,  p.  For  p  -  2,  the 
distribution  becomes  Normal.  The  probability  density 
function  is: 
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f(x;p  ,M»or)  -  {pg(p)/2r(l/p)cr]exp[-|g(p)(x-^)/a  |p] 


A)  Generalized  t  Distribution.  This  distribution  models 
symmetric  data  with  moderate  to  extremely  leptokurtic 
distributions.  As  the  degrees  of  freedom  parameter,  n, 
approaches  infinity,  the  distribution  approaches  the 
normal.  For  n*l,  the  distribution  reduces  to  the  Cauchy. 
The  general  density  function  is: 

f(x;  n,(T,n)n  t(n  +  l)/2]  t  l+<  x-M)  2 1  o  2n  ]  <  n  + 1  >  /  2  /  r(n/ 2  )  a(  tt  n ) 
-  oo  <  x ,  #*  <  0  <  cr  <  ao  1  £  n  <  co 

5)  R-S  Distribution.  This  distribution  was  originally 
developed  by  Ramberg  and  Schmeister  (155)  to  generate 
random  variates.  It  is  a  generalization  of  Tukey's  lambda 
function  (  205)  and  can  be  used  to  model  a  wide  variety  of 
data  shapes.  The  probability  density  function  is  given  in 
terms  of  the  percentile  function,  R(p). 

f(x;p,a,b,c,d)  ■  f(R(p))  -  ( cpc" *+d ( 1-p) d_1 ) /b 

R(p)  -  a+(pc-( l-p)d)/b 

-oo  <  a  x  <  oo  -oo  <  b,c,d  <cc  0£p<l 

6)  Generalized  Life  Model.  Developed  by  Moore  and 
Bilikam,  this  model  includes  as  special  cases  the  Ueibull 
and  the  Raleigh  distributions.  The  probability  function 
Is  given  by: 


f (x;a ,b  ,g(x) )  “  bg' ( x) ( g ( x) ) b" 1  exp [ -( g ( x) ) b/a  ] / ; 


g  ( x)  e  R 


lim  g ( x )  *  0 
x->0 


11m  g ( x)  ■  « 
x— >oo 


g(x)  strictly  Increasing 


0  <  x  ,a  ,b  <  oo 


The  Generalized  Beta  Distribution  was  chosen  as  the 
family  to  consider  for  parameterizing  the  estimate.  The 
distance  measures  considered  were  those  discussed  in 
Chapter  III,  and  approximate  MISE  was  chosen  as  an  accept¬ 
able  measure  based  upon  tests  of  ten  samples  and  the 
variability  of  the  measure.  Both  the  estimated  prob¬ 
ability  density  function  and  estimated  cumulative  distri¬ 
bution  function  were  evaluated  in  the  minimization  scheme, 
but  the  probability  density  function  based  method  was 
eliminated  due  to  convergence  problems  In  the  optimization 
procedure  and  the  frequency  of  local  minima. 

The  distance  measure  was  minimized  using  the  routine 

ZXMIN  on  the  International  Mathematical  and  Statistical 

Libraries.  For  each  sample  three  starting  points  were 

used  which  were  chosen  as  the  minimum  distances  out  of  a 

grid  of  elghty-one  points.  The  Beta  parameters  were 

bracketed  by  choosing  the  largest  spike  in  the  non-para- 

metrlc  estimate  of  the  density  and  adjusting  the  sizes  of 

p  and  q  by  the  ratio  of  the  number  of  points  to  the 
nd  x  sax 

left  of  the  average  of  ^  and  X(Q)  bo  the  number  of 
points  to  the  right  of  the  average. 


Points  to  Left  of  Average 
Points  to  Right  of  Average 


B  3  ma  x 


i  / ( n  +  1 ) 


x(l)-*(l-j) 


i  *  1  ^  2  y  •  •  •  ^  n 


i-j  3  1  ,2  ....  ,n 


Then 


max(  3  ,B/R) 


raax( 3 , BR) 


The  points  for  evaluating  the  distance  function  were  then 
chosen  as: 


Pi  *  iPmax/10 


1  ,  2  ,  .  .  .  ,  9 


i  - 


<U  *  ^max'10 


1  -  1,2 . 9 


The  distance  measures  were  calculated  at  each  of  the 
elghty-one  starting  points  and  the  best  three  points  were 
chosen  for  starting  a  modified  gradient  search.  If  two  of 
the  three  starting  points  did  not  converge  to  the  same 
Beta  parameters,  additional  starting  points  were  chosen. 
If  the  parameters  converged  to  some  point  outside  Paax  or 
qmax  then  additional  points  were  checked  to  investigate 
the  possibility  of  a  local  minimum. 
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Table  6  -  Average  square  error  for  basic  and 

parameterized  estimates,  n»100 


D is  t r ib u t ion 

Average  Square  Error 

pdf 

CDF 

Basic 

Beta  fit 

Basic 

Beta  fit 

Cauchy 

.0143 

.1135 

.00270 

.00301 

Lap  lace 

.0059 

.0355 

.00151 

.00906 

No  rraa  1 

.0012 

.0003 

.00054 

.00022 

Uniform 

.0048 

.0041 

.00104 

.00042 

Be  ta ( . 6 , . 8 ) 

.0061 

.0020 

.00181 

.00127 

Beta (2,3) 

.0010 

.0006 

.00054 

.00038 

Results  of  the  distance  estimation  using  a  beta  dis¬ 
tribution  are  shown  in  Table  6.  For  beta  family  and 
distributions  with  lighter  tails  than  the  normal  the 
method  resulted  in  acceptable  estimates.  For  distribu¬ 
tions  with  heavy  tails,  double  exponential,  Cauchy,  etc., 
the  method  results  in  a  poor  fit.  This  is  reasonable 
since  the  beta  distribution  has  lighter  tails  than  the 
normal  even  as  the  support  becomes  very  large. 

The  distance  estimation  results  are  based  upon  medians 
of  twenty-five  minimum  distance  fits  to  each  of  the  dis¬ 
tributions.  The  small  number  of  runs  was  a  result  of 
difficulties  in  getting  the  minimization  procedure  to 
converge  and  the  large  number  of  local  minima,  particu¬ 
larly  with  the  more  leptokurtic  distributions. 
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IV. 2  Application  to  Snail  Sanple  Analysis 

There  are  nany  Instances  where  the  cost  of  obtaining  a 
single  sanple  point  of  a  random  variable  is  extrenely 
high.  This  is  particularly  true  in  instances  of  testing 
or  analyzing  complex  physical  systems,  for  example  air¬ 
craft  stress  analysis.  In  these  cases,  one  is  sometimes 
asked  to  make  an  estimate  based  upon  a  mere  handful  of 
data  points.  The  estimator  we  have  developed  can  be  used 
as  a  variance  reduction  technique  in  cases  such  as  these. 

As  a  theoretically  interesting  example,  consider  the 
determination  of  rr  by  a  Monte  Carlo  technique.  If  we 
consider  a  circle  inscribed  within  a  unit  square  as  in 
Figure  25  and  generate  uniformly  distributed  random 
variable  pairs  (x,y)  on  [0,1)  x  [0,1],  then  the 


Figure  25  -  Geometry  for  Calculation  of  rr 


probability  of  any  (x,y)  pair  lying  within  the  circle  is 
equal  to  the  ratio  of  the  circle's  area  to  the  square's 
area  or  jt/4.  If  we  wish  to  calculate  an  estiaate  of  tt  we 
.nay  do  so  by  taking  the  ratio  of  hits,  (x,y)  pairs  within 
the  circle,  to  total  pairs  generated  and  multiply  by  4. 

,  Pairs  in  Circle  ^ 

Total  Pairs 

Another  way  of  looking  at  this  problem  is  to  define  a 
new  random  variable 

Z  *  X2  +  Y2 

then : 

ff  -  4P(  z  <  1)  -  4PZ( 1) 

If  we  calculate  the  distribution  of  z  we  may  estimate  ir  in 
this  manner. 

Figure  26  shows  the  results  of  estimating  ir  by  the 
Monte  Carlo  method  and  by  the  distribution  function 
method.  The  curves  shown  for  the  distribution  function 
method  are  based  upon  Monte  Carlo  analysis  with  one 
thousand  repetitions  at  each  sample  size  (2,4,5,10,20, 
40,100).  The  curves  for  the  Monte  Carlo  method  are  deter¬ 
mined  using  binomial  probabilities.  In  addition,  the  per¬ 
centage  of  times  the  distribution  function  method  beat  the 
Monte  Carlo  method  was  calculated.  Table  7  presents  these 


results 


als  for  Snail  Sanple  Technique 


This  analysis  would  tend  to  Indicate  that  for  this 
type  of  problem  one  would  wish  to  use  the  distribution 
function  method  for  sample  sizes  less  than  about  fifty 
rather  than  inferring  information  directly  from  the  sample 
Itself.  This  is  certainly  true  when  sample  sizes  get  down 
to  about  five.  Note  that  if  a  larger  number  of  samples  is 
available  the  Monte  Carlo  method  is  preferable  when  a 
single  point  in  the  distribution  is  sought. 


the  unifora,  normal,  and  double  exponential  distributions. 

The  errors  shown  are  actual  squared  error  calculated 
from  100  random  samples  of  sizes  ten  and  100  from  each  of 
the  distributions.  The  horizontal  axis  data  is  first 
scaled  to  make  F~*(.99)  -  F“*(.01)  *  1  for  each  distribu¬ 
tion.  Then  the  plotted  values  represent: 

[ F”l( i/ 100)-P“l( i/ 100) ) 2  1-1,2, . . . ,99 

Ninety-five  percent  of  the  errors  are  less  than  or  equal 
to  the  curves  shown.  In  all  cases,  the  median  error  was 
approximately  one  order  of  magnitude  smaller  than  the  95% 
confidence  bound. 

Although  the  errors  appear  to  decrease  as  one  esti¬ 
mates  a  point  farther  out  in  the  tail  of  a  distribution, 
this  is  not  entirely  accurate.  For  distributions  with 
infinite  support  the  errors  increase  again  as  a  smaller 
percentage  point  is  estimated.  This  is  due  to  the  inher¬ 
ent  finite  support  of  the  estimator.  A  reasonable  guide¬ 
line  would  be  that  the  sample  size  should  be,  as  a  mini¬ 
mum,  approximately  equal  to  the  Inverse  of  the  desired 
percentage  point.  That  is,  if  we  wish  to  estimate  x  001 
we  should  start  with  a  sample  size  of  about  1000  or 
greater  points.  The  increase  of  accuracy  near  the  extreme 
percentage  points  apparent  in  Figures  27  and  28  is  due 
primarily  to  the  accuracy  of  the  endpoint  estimate. 


If 


Figure  27  -  95X  Upper  Confidence  Bound  on  Errors 


IV. 4  Two  Sample  Test 


The  classical  two-sample  tests  take  two  raadom  samples 
from  Independent  distributions  and  test  for  some  simi¬ 
larity  (equivalence,  same  location,  same  scale)  in  the 
underlying  distributions  based  on  the  sample.  The  usual 
approach  is  to  combine  the  two  samples  into  one  and  use  as 
a  test  statistic  some  function  of  the  ordering  of  the 
combined  sample.  Some  popular  tests  of  this  genre  are  the 
W a  Id -W o 1 f o w 1 1 z  test  (218),  the  Smirnov  test  (190),  the 
Cramer-von  Mises  test  (56),  the  median  test  (56),  and  the 
Mann-Whitney  test  (117).  Many  other  two-sample  tests 
exist,  but  none  use  the  samples  independently  to  evaluate 
a  probability  density  function  and  then  use  the  resultant 
density  estimates  in  testing  the  null  hypothesis.  There 
are  two  good  reasons  for  this.  The  first  is  that  theo¬ 
retical  distributions  of  the  test  statistic  are  virtually 
impossible  to  derive  due  to  the  complexity.  The  second  is 
that  the  noisy  properties  of  a  density  function  estimate 
can  tend  to  obscure  the  true  underlying  density  unless  the 
estimator  is  very  good  indeed.  The  first  problem  can  be 
somewhat  overcome  by  using  Monte  Carlo  methods  to  develop 
the  critical  values  of  the  test  statistic.  The  second  has 
not,  to  this  author's  knowledge,  been  previously  success¬ 
fully  attempted. 

The  ability  to  generate  a  density  function  which 
accurately  represents  the  sample's  underlying  density 


suggests  that  a  test  of  the  hypothesis  that  two  samples 
come  from  the  same  distribution  might  be  more  powerful  if 
performed  upon  the  densities  rather  than  on  distribution 
functions.  Power  studies  on  two-sample  tests  are  not 
generally  available  in  the  literature,  mostly  due  to  the 
difficulty  in  picking  one  of  the  infinite  alternative 
distributions  to  test  against.  The  two-sample  Smirnov 
Test  is  generally  felt  to  be  as  powerful  as  any  (56)  of 
the  popular  two-sample  tests  and  will  be  used  for  compari¬ 
son  purposes  in  the  development  of  a  new  two-sample  test. 

The  uniform,  normal,  beta  (1,3),  beta  (.5, .5),  and 
Laplace  distributions,  all  with  zero  mean  and  unit  vari¬ 
ance,  were  used  as  representative  distributions  in  the 

development  of  the  critical  values.  One  hundred  estimates 

V. 

were  made  from  samples  of  sizes  ten  and  one  hundred  from 
each  distribution.  The  test  statistic  used  was  the  inte¬ 
grated  absolute  error  between  all  possible  combinations  of 
two  estimates.  This  resulted  in  4950  samples  of  the  test 
statistic  for  each  distribution,  a  total  of  24750  points. 

The  null  hypothesis  to  be  tested  was: 

H0  :  F0  «  Fj  or  fQ  -  fx 

against  the  two-sided  alternative  hypothesis: 


Tests  using  both  the  cumulative  distribution  function  and 
the  probability  density  function  were  developed  and  com¬ 
pared  to  the  Smirnov  test  applied  to  the  same  samples. 


The  critical  values  of  the  test  statistic  are  given  in 
Table  8.  The  cumulative  distribution  function  test 

Table  8  -  Critical  Values  of  the  Two-Sample 

Test  Statistic 


S ignif icance 

CDF 

Tes  t 

pdf  Test 

Leve  1 

n  -  10 

n  -  100 

n  -  10 

n  -  100 

.001 

10.258 

.07861 

1.3814 

.01130 

.005 

10.185 

.07707 

1.1618 

.01058 

.01 

10.043 

.07601 

1.1294 

.00988 

.05 

4.1517 

.03263 

.69  8? 

.00711 

.10 

2.6214 

.02067 

.4915 

.00484 

.15 

2.1287 

.01339 

.3467 

.00436 

.20 

1.7304 

.01108 

.2975 

.00400 

.25 

1.4039 

.00953 

.249  1 

.00375 

.30 

1.0826 

.00792 

.2041 

.00352 

.35 

.8904 

.00654 

.1179 

.00336 

.40 

.7795 

.00536 

.1609 

.00319 

.45 

.6610 

.00467 

.1454 

.00305 

.50 

.5487 

.00410 

.1291 

.00292 

statistics  ate  calculated  using  the  same  results  as  those 
used  in  the  pdf  critical  value  computations.  They  are 
given  primarily  to  allow  a  direct  power  comparison  with 
-the  Smirnov  test  for  this  particular  set  of  samples. 

The  critical  values  In  Table  8  were  used  In  a  power 
study  with  all  possible  combinations  of  the  chosen 
samples.  This  resulted  in  10000  samples  from  each  pair  of 
dissimilar  distributions.  The  power  of  the  test  was  cal¬ 
culated  by  taking  the  proportion  of  samples  correctly 
rejected  as  failing  to  meet  the  hypothesis.  Tables  9,  10, 

11,  12,  13,  and  14  show  a  comparison  of  the  powers  of  the 
three  tests. 

The  power  studies  indicate  that: 

1)  The  test  based  on  the  new  probability  density 
function  is  consistently  more  powerful  than  the  cumulative 
distribution  function  based  test  developed  here. 

2)  The  Smirnov  test  is  slightly  more  powerful  than 
the  probability  density  function  test  for  uniform  alterna- 
t ives  . 

3)  The  test  based  on  the  probability  density  func¬ 
tion  is  clearly  superior  for  differentiating  a  normal  from 
a  double  exponential  sample. 

Figures  29(a)  and  (b)  show  the  distribution  and  den¬ 
sity  functions  of  the  distributions  used  in  the  power 
studies  for  this  test.  One  could  argue  that  the  normal 
and  double  exponential  are  more  clearly  distinquishable  in 


Level  Normal  Uniform 


Table  9  -  Power  Comparisons  for  Two-Sample  Tests 


Normal 


Level  Normal  Laplace 


Table  11  -  Power  Comparisons  for  Two-Sample  Tests 


Smirnov  CDF  -  Test  pdf  -  Test  Smirnov  CDF  -  Test 


ons  for  Two-Sample  Tests  (n*100) 


the  densities.  This  Is  not  as  true  for  the  unifora.  This 
could  be  the  reason  for  the  improved  pover  of  the  prob¬ 
ability  density  function  test  over  the  Salrnov  test. 

The  power  differences  night  lead  one  to  develop  a  two- 
sample  test  which  mixes  the  results  of  both  the  tests  dis¬ 
cussed  in  this  chapter  to  obtain  an  "optimal”  test  statis¬ 
tic.  This  "optimal"  statistic  would  be  a  function  of  some 
shape  statistic,  such  as  a  modified  Hogg's  Q  or  a  modified 
kurtosis.  It  might  also  be  a  function  of  sample  size, 
although  the  sample  size  appears  to  have  only  a  small 
effect . 

The  two-sample  test  developed  in  this  chapter  shows 
clear  superiority  over  an  effective  classical  test  under 
some  circumstances.  Additional  research  in  this  area  is 
discussed  in  the  Summary  and  Recommendations. 

IV. 5  Other  Applications 

Sample  distribution  functions  have  long  been  used  in 
the  areas  of  non-p a ra m e t r 1 c  goo d n e s s -o f - f 1 t  tests,  two 
sample  tests,  and  tests  for  equality  of  some  moment  or 
other  property  of  the  distribution  underlying  a  sample. 
The  density  function  has  seen  little  use  in  this  area  for 
two  reasons: 

1)  It  is  difficult  to  calculate  a  estimate  of  the 
density,  with  the  nice  properties  that  already  exist  for 
the  sample  distribution  function. 


2)  There  has  beea  no  particular  motivation  to  calcu¬ 


late  the  density. 

The  first  reason  is  a  valid  one.  In  the  case  of  the 
secoad  reason  there  should  have  been  some  motivation, 
since  the  density  estimate  can  provide  additional 
information  which  is  particularly  useful  in  small  sample 
s ituations  . 

In  order  to  see  the  utility  of  the  density  function, 
and  grasp  the  idea  that  further  information  is  provided, 
it  is  instructive  to  consider  a  simple  case.  The 
Ko 1 mogo ro v-S m 1 rno v  statistic  (or  any  of  several  related 
statistics)  is  frequently  used  in  various  non-parametric 
tests.  This  statistic  is  defined  as: 

SF  -  sup  | Fj(x)  -  F2(x) | 

K 

If  there  were  no  further  Information  to  be  gained  from  the 
density  function  then  one  would  expect  that  a  similar 
statistic  defined  for  the  density  function, 

Sf  -  sup  | f !<x)  -  f2(x) | 

would  be  determined  at  the  same  point,  x,  as  Sp.  That  is, 
if  we  consider  continuous  functions  and  define  the  points 

(xx)  -  f 2(xj) | 

104 


•  v. 


where  the  maxima  occur  as: 


‘ 2 


then  Xj  -  x2.  This  Is  not  true  In  general.  Consider  the 

A  A 

case  where  -  Fj  Is  a  continuous  function.  Then  we  know 

A  A 

that  at  max  | F ^  -  F2  |  , 

d/dx  ( Fj  -  F2)  *  fx  -  f2  -  0 

If  the  maximum  Is  interior  to  the  Interval  of  definition. 

A  A 

We  also  know  that  P(Fj  -  F2  ■  0  for  all  x)  *  0  Implies 
P( Sf»0)  *  0  and  P(xj)  *  0  for  a  continuous  distribution 
and  any  single  point  (consider  an  endpoint),  x^,  thus 

P(x1*x~)  -  P(FX  -  F2  )  +  P( x i“x2“x1)  -  0 

so  the  point  associated  with  S F  Is  not  the  same  as  the 
point  associated  with  S  ^ ,  and  different  information  Is 
obtained  with  each  of  the  functions. 

Based  upon  this  reasoning,  it  seems  safe  to  conclude 
that  any  test  which  has  been  performed  with  the  sample 
distribution  function  can  be  improved  by  using  the  sample 
density  function  estimate  derived  here.  This  is  not  to 
imply  that  sample  distribution  function  methods  should  be 
discarded,  but  rather  that  they  should  be  augmented  by  a 
similar  procedure  using  the  density  function  estimate. 
The  quality  of  such  tests  has  already  been  demonstrated. 
Good ne s s -o f - f  1 1  tests  and  multiple  sample  tests  are 
logical  extensions  of  this  work. 


[*] 


V.  Guidelines  for  Using  the  Estimator. 


One  always  has  the  choice  of  using  a  parametric  or 
non-p a ra me t r  1  c  method  of  estimating  the  density.  The 
choice  is  really  whether  or  not  one  has  enough  knowledge 
to  be  able  to  select  the  distributional  form  a  priori. 
The  alternative  is  to  rely  upon  the  "gods  of  chance"  and 
let  the  data  determine  the  distributional  form  and 
"parameters".  Using  a  non-parametrlc  method  is,  in  a 
sense,  taking  out  an  insurance  policy.  The  premium  will 
determine  whether  or  not  the  policy  is  cost  effective. 

Classical  estimation  relies  on  the  choice  of  a  set  of 
parameters  from  an  assumed  underlying  probability  dis¬ 
tribution.  In  general,  the  underlying  distribution  is 
selected  by  e  x  t  r  a  -  ma  t  he  m  a  t  i  ca  1  means  and  is  done  sepa¬ 
rately  from  the  parameter  estimation.  As  Fisher  (52) 
stated  in  1922,  when  discussing  the  problem  of  specifica¬ 
tion  (as  he  called  the  selection  of  an  underlying  density 
function)  : 


"As  regards  problems  of  specification, 
these  are  entirely  a  matter  for  the  practical 
statistician,  for  those  cases  where  the 
qualitative  nature  of  the  hypothetical  popu¬ 
lation  is  known  do  not  involve  any  problems  of 
this  type.  In  other  cases  we  may  know  by 
experience  what  forms  are  likely  to  be  suit¬ 
able,  and  the  adequacy  of  our  choice  may  be 
tested  a  posteriori.  We  must  confine  ourselves 
to  those  forms  which  we  know  how  to  handle,  or 
for  which  any  tables  which  may  be  necessary 
have  been  constructed." 


Unfortunately,  we  are  often  not  blessed  with  the  In¬ 
sight  necessary  to  correctly  select  the  underlying  distri¬ 
bution.  Thus,  attempts  have  been  made  to  determine  the 
form  of  a  sample  ( goodnes s -o f - f 1 t )  or  to  protect  against 
incorrect  assumptions  by  making  the  procedures  robust 
(  5,  8,22,23,35,53,55,66,67,72,79,81,82,83,86,109,127,135, 
163,198).  Traditional  goodnes s -o f -f 1 t  tests  have  low  power 
for  small  samples.  There  are  penalties  to  pay  for  any 
form  of  protection  derived  by  making  the  estimate  more 
robust,  just  as  there  are  penalties  for  assuming  the  wrong 
density  form  and  blindly  applying  classical  procedures. 

Probably  the  most  common  way  to  analyze  data  Is  to 
estimate  the  parameters  of  a  distribution  using  an 
assumed  probability  law.  Classical  methods  of  parameter 
estimation  abound,  Including  the  maximum  likelihood  method 
and  the  method  of  moments.  Frequently,  certain  proper¬ 
ties  of  estimators  are  required,  such  as:  unbiasedness, 
invariance, or  linearity.  These  further  restrict  the  class 
of  estimators  considered  and  one  can  usually  define  a 
"best”  estimator  within  a  certain  class.  All  of  these 
methods  assume  a  parametric  model  of  the  data. 

An  analysis  of  the  new  density  estimator  presented 
here  shows  one  what  the  "premiums”  are  for  using  this 
estimator  rather  than  a  maximum  likelihood  parametric 
estimator.  (Although  the  maximum  likelihood  estimate  Is 
not  necessarily  the  "best",  It  Is  well  known  and  easily 


calculated  for  comparison  purposes.)  The  specific  estima¬ 
tors  used  were: 

Uniform  [  a ,b ] : 

a  -  x{1)  b 

Normal  ( n,a  )  : 

H  -  x  a2  -  1/a 

Double  Exponential  (p,a): 

A  A  A 

M  *  median{x^}  a  * 

One  hundred  samples  of  size  100  and  of  size  ten  were 
generated  from  each  of  the  three  distributions,  all  with 
zero  mean  and  unit  variance.  The  approximate  MISE  was 
calculated  for  both  the  cumulative  distribution  function 
and  the  probability  density  function  using  both  estima¬ 
tors.  The  medians  of  the  errors  calculated  were  used  to 
generate  the  tables  in  this  section. 

First  the  ratio  of  the  errors: 

„  m  Estimator  Error _ 

Max  Likelihood  Error 

was  calculated  for  each  of  the  cases.  These  data  are 
presented  in  Tables  15  and  16.  The  obvious  conclusion  is 


*  x(n ) 


(xi'x)2 

i-1 


i/h  ^  lxil 

i-1 


Table  15 


CDF  Median  Error  Ratios  (ASE) 


Sample  Size 

Uniform 

Norma  1 

Laplace 

10 

1.405 

.8769 

2.234 

100 

6.913 

.8553 

10.2  1 

Table  16  - 

Pdf  Median  Error  Ratios 

(ASE) 

Sample  Size 

Uniform 

Normal 

Lap  la  c  e 

10 

1.133 

n 

1.581 

100 

1.097 

ra 

7.425 

that  for  small  sample  sizes  the  penalty  for  using  the  new 
non -p a r a m e t r  i  c  estimator,  even  when  one  suspects  the 
underlying  distribution,  may  not  be  too  large.  It  is  also 
Interesting  to  note  that  the  non-pa rametric  estimate  for 
the  normal  cumulative  distribution  function  is  consis¬ 
tently  better  than  the  parametric  maximum  likelihood 
estimate.  The  non-parametric  estimate  resulted  in  lower 
error  for  74%  of  the  samples. 

Since  the  entries  in  Tables  15  and  16  are  ratios,  a 
decision  of  whether  or  not  one  can  accept  the  degradation 
in  the  estimate  is  dependent  upon  the  actual  error  values. 
Tables  17  and  18  show  the  median  error  values  calculated 


for  the  same  samples.  In  general,  the  errors  are  quite 


Table  17  -  Median  MISE  for  Estimates  of  CDF 


Sample  Size 

Estimate 

Uniform 

Norma  1 

Laplace 

10 

New 

.0081  1 

.00667 

.00748 

10 

Max  Lik 

.004  A  3 

.00960 

.00394 

100 

New 

.00030 

.00054 

.00270 

100 

Max  Lik 

.00004 

.00075 

.0002  1 

Table  18  -  Median  MISE  for  Estimates  of  Pdf 


Sample  Size 

Es  t ima  te 

Uniform 

Normal 

Laplace 

10 

New 

.0135 

.0101 

.0261 

10 

Max  Lik 

.0103 

.0072 

.0146 

100 

New 

.0011 

.0011 

.0066 

100 

Max  Lik 

.0008 

.0008 

.0008 

low  for  many  applications,  thus  one  may  be  willing  to  pay 
the  "premium"  for  using  the  non-parametr ic  estimate. 

The  alternative  is  to  select  the  underlying  distribu¬ 
tion  in  advance.  One  may  potentially  pay  a  different 
penalty  in  this  case,  that  of  selecting  the  wrong  distri¬ 
bution.  Tables  19,  20,  21,  and  22  show  the  comparable 
errors  for  selecting  the  wrong  distributions  in  these 


cases. 


Table  19  -  Median  CDF  MISE  for  Maximum 
Likelihood  Estimate  n  -  10 


Actual  Distribution 


Assumed 

Distribution 


Uniform 

Normal 

Laplace 

1 

.00443 

.02447 

.03772 

Normal 

.02364 

.00960 

.02738 

Laplace 

- 

.03311 

.01999 

.00394 

Table  20  -  Median  Pdf  MISE  for  Maximum 
Likelihood  Estimate  n  *  10 


Actual  Distribution 


As  s  umed 
Distribution 


Uniform 

Normal 

Laplace 

Uniform 

.0103 

.1003 

.1305 

Normal 

.0866 

.0072 

.095  1 

Laplace 

.112  1 

.0888 

.0146 

Table  21  -  Median  CDF  MISE  for  Maximum 
Likelihood  Estimate  n  =  100 


Actual  Distribution 


Assumed 

Distribution 


Uniform 

Normal 

Lap  lace 

Uniform 

.00004 

.03333 

.05805 

Normal 

.03790 

.00075 

.00367 

Laplace 

.04988 

.00471 

.00021 

Table  22 


Median  Pdf  MISE  for  Maximum 
Likelihood  Estimate  n  *  100 


Ass  umed 
Distribution 


Actual  Distribution 

Uniform 

Normal 

Lap  lace 

Uniform 

.0008 

.0588 

.0900 

Normal 

.0421 

.0008 

.0077 

Laplace 

.0847 

.0079 

.0008 

One  may  be  tempted  to  say  that  the  Q  discriminant  pro¬ 
babilities  in  Tables  1  and  2  coupled  with  the  errors  in 
Tables  19  through  22  could  lead  to  an  adaptive  parametric 
estimator  with  smaller  expected  error  than  the  non-para- 
metric  estimator.  This  is  true;  however,  one  is  seldom, 
if  ever,  in  the  situation  where  a  selection  between  only 
these  distributions  needs  to  be  made.  In  the  real  world  a 
selection  must  be  made  from  a  continuous  space  of  distri¬ 
butions.  The  parametric  estimator  will  pay  a  penalty 
based  upon  the  distance  between  the  assumed  distribution 
and  the  actual  distribution.  The  non-pa ra me t r i c  estimator 
is  more  likely  to  pay  a  penalty  related  to  some  measure  of 
the  shape  of  the  underlying  distribution. 

The  question  of  what  estimator  to  use  ultimately 
depends  its  specific  use.  However,  for  many  cases  where 
there  is  uncertainty  about  the  underlying  distribution, 
and  where  the  sample  size  is  small,  the  risk  in  using  the 


non-parametric  procedure  outlined  in  this  paper  is  smaller 
than  the  risk  associated  with  using  a  maximum  likelihood 
estimate. 


VI.  Suaaary  and  Recoaaendatlona. 

A  new  n  o  n  -  p  a  r  a  m  e  t  r  i  c  density  estimate  has  been 
developed  which  has  the  following  properties: 

1)  It  is  continuous  and  piecewise  linear. 

2)  It  converges  to  the  true  density  function  If  the 
true  density  has  no  more  than  a  finite  number  of  discon¬ 
tinuities  of  a  form  where  the  value  at  the  discontinuity 
can  be  considered  the  average  of  the  limiting  values  on 
either  side  of  the  discontinuity. 

3)  It  requires  no  user  supplied  parameters. 

The  estimator  is  shown  to  have  significantly  better 
error  properties,  for  certain  classes  of  distributions, 
than  existing  density  estimators.  The  quality  of  the 
estimate  is  discussed,  tabulated  and  graphically  demon¬ 
strated.  Applications,  including  parameterization,  small 
sample  analysis,  and  two  sample  tests  are  presented. 
These  newly  developed  applications  are  shown  to  Improve 
upon  the  generally  accepted  existing  techniques.  Guide¬ 
lines  for  choosing  a  density  estimation  method  along  with 
a  discussion  of  an  approach  to  method  selection  are  pre¬ 
sented  . 

Research  opportunities  in  the  field  of  density  estima¬ 
tion  and  its  applications  have  been  expanded  by  this 
research.  In  particular,  the  applications  shown  have 
demonstrated  the  utility,  versatility,  and  strengths  of 


density  based  techniques.  Some  of  the  possible  extensions 
of  this  effort  are: 

1)  Extension  of  the  technique  to  multivariate  density 
estimation. 

2)  More  exhaustive  analysis  of  the  two  sample  test 
should  be  made  to  better  bound  the  critical  values  and 
power  of  the  test.  Theoretical  developments  in  this  area 
may  be  feasible. 

3)  Some  of  the  endpoint  estimation  techniques  show 
promise  as  tail  length  discriminators.  Additional  re¬ 
search  along  these  lines  could  lead  to  better  methods  of 
tail  classification  and  support  definition. 

4)  Goodness-of-f it  tests  using  the  same  technique  as 
the  two-sample  test  should  be  more  powerful  against  some 
alternatives  than  existing  tests.  If  used  in  conjunction 
with  existing  tests,  they  should  always  increase  the 
power . 

5)  New  techniques  of  searching  the  objective  space  in 
minlmun  distance  estimation  could  lead  to  more  effective 
parameterization  of  the  density.  At  least  a  four  param¬ 
eter  family  is  probably  necessary  to  cover  unlmodal 
densities.  Search  time  is  prohibitively  expensive  using 
the  scheme  presented  here  to  find  a  global  minimum  of  the 
distance  norm  in  four  parameter  space  sod  evaluate  it 
thoroughly . 
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